PEP 399: Pure Python/C Accelerator Module Compatibiilty Requirements
At both the VM and language summits at PyCon this year, the issue of
compatibility of the stdlib amongst the various VMs came up. Two issues came
about in regards to modules that use C code. One is that code that comes in
only as C code sucks for all other VMs that are not CPython since they all
end up having to re-implement that module themselves. Two is that modules
that have an accelerator module (e.g., heapq, warnings, etc.) can end up
with compatibility options (sorry, Raymond, for picking on heapq, but is was
what bit the PyPy people most recently =).
In lieu of all of this, here is a draft PEP to more clearly state the policy
for the stdlib when it comes to C code. Since this has come up before and
this was discussed so much at the summits I have gone ahead and checked this
in so that even if this PEP gets rejected there can be a written record as
to why.
And before anyone asks, I have already run this past the lead devs of PyPy,
Jython, and IronPython and they all support what this PEP proposes. And with
the devs of the other VMs gaining push privileges there shouldn't be an
added developer burden on everyone to make this PEP happen.
==========================================================
PEP: 399
Title: Pure Python/C Accelerator Module Compatibiilty Requirements
Version: $Revision: 88219 $
Last-Modified: $Date: 2011-01-27 13:47:00 -0800 (Thu, 27 Jan 2011) $
Author: Brett Cannon
Brett Cannon, 05.04.2011 01:46:
At both the VM and language summits at PyCon this year, the issue of compatibility of the stdlib amongst the various VMs came up. Two issues came about in regards to modules that use C code. One is that code that comes in only as C code sucks for all other VMs that are not CPython since they all end up having to re-implement that module themselves. Two is that modules that have an accelerator module (e.g., heapq, warnings, etc.) can end up with compatibility options (sorry, Raymond, for picking on heapq, but is was what bit the PyPy people most recently =).
In lieu of all of this, here is a draft PEP to more clearly state the policy for the stdlib when it comes to C code. Since this has come up before and this was discussed so much at the summits I have gone ahead and checked this in so that even if this PEP gets rejected there can be a written record as to why.
And before anyone asks, I have already run this past the lead devs of PyPy, Jython, and IronPython and they all support what this PEP proposes.
We recently had the discussion about reimplementing stdlib C modules in Cython. Accelerator modules are the obvious first step here, as they could be implemented in Python and compiled with Cython, instead of actually writing them in C in the first place. Wouldn't this be worth mentioning in the PEP? Stefan
On Tue, Apr 5, 2011 at 01:26, Stefan Behnel
Brett Cannon, 05.04.2011 01:46:
At both the VM and language summits at PyCon this year, the issue of
compatibility of the stdlib amongst the various VMs came up. Two issues came about in regards to modules that use C code. One is that code that comes in only as C code sucks for all other VMs that are not CPython since they all end up having to re-implement that module themselves. Two is that modules that have an accelerator module (e.g., heapq, warnings, etc.) can end up with compatibility options (sorry, Raymond, for picking on heapq, but is was what bit the PyPy people most recently =).
In lieu of all of this, here is a draft PEP to more clearly state the policy for the stdlib when it comes to C code. Since this has come up before and this was discussed so much at the summits I have gone ahead and checked this in so that even if this PEP gets rejected there can be a written record as to why.
And before anyone asks, I have already run this past the lead devs of PyPy, Jython, and IronPython and they all support what this PEP proposes.
We recently had the discussion about reimplementing stdlib C modules in Cython. Accelerator modules are the obvious first step here, as they could be implemented in Python and compiled with Cython, instead of actually writing them in C in the first place. Wouldn't this be worth mentioning in the PEP?
I consider whether Cython is used orthogonal to the PEP. If Cython is actually used and it is deemed needed I will update the PEP.
On Tue, Apr 5, 2011 at 9:46 AM, Brett Cannon
try: c_heapq.heappop(Spam()) except TypeError: # "heap argument must be a list" pass
try: py_heapq.heappop(Spam()) except AttributeError: # "'Foo' object has no attribute 'pop'" pass
This kind of divergence is a problem for users as they unwittingly write code that is CPython-specific. This is also an issue for other VM teams as they have to deal with bug reports from users thinking that they incorrectly implemented the module when in fact it was caused by an untested case.
While I agree with the PEP in principle, I disagree with the way this example is written. Guido has stated in the past that code simply *cannot* rely on TypeError being consistently thrown instead of AttributeError (or vice-versa) when it comes to duck-typing. Code that cares which of the two is thrown is wrong. However, there actually *is* a significant semantic discrepancy in the heapq case, which is that py_heapq is duck-typed, while c_heapq is not:
from test.support import import_fresh_module c_heapq = import_fresh_module('heapq', fresh=['_heapq']) py_heapq = import_fresh_module('heapq', blocked=['_heapq']) from collections import UserList class Seq(UserList): pass ... c_heapq.heappop(UserList()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: heap argument must be a list py_heapq.heappop(UserList()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py3k/Lib/heapq.py", line 140, in heappop lastelt = heap.pop() # raises appropriate IndexError if heap is empty File "/home/ncoghlan/devel/py3k/Lib/collections/__init__.py", line 848, in pop def pop(self, i=-1): return self.data.pop(i) IndexError: pop from empty list
Cheers, Nick. P.S. The reason I was bugging Guido to answer the TypeError vs AttributeError question in the first place was to find out whether or not I needed to get rid of the following gross inconsistency in the behaviour of the with statement relative to other language constructs:
1() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object is not callable with 1: pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'int' object has no attribute '__exit__'
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Apr 5, 2011 at 05:01, Nick Coghlan
On Tue, Apr 5, 2011 at 9:46 AM, Brett Cannon
wrote: try: c_heapq.heappop(Spam()) except TypeError: # "heap argument must be a list" pass
try: py_heapq.heappop(Spam()) except AttributeError: # "'Foo' object has no attribute 'pop'" pass
This kind of divergence is a problem for users as they unwittingly write code that is CPython-specific. This is also an issue for other VM teams as they have to deal with bug reports from users thinking that they incorrectly implemented the module when in fact it was caused by an untested case.
While I agree with the PEP in principle, I disagree with the way this example is written. Guido has stated in the past that code simply *cannot* rely on TypeError being consistently thrown instead of AttributeError (or vice-versa) when it comes to duck-typing. Code that cares which of the two is thrown is wrong.
Which is unfortunate since least common base class is Exception. But I can add a note to the PEP saying that this is the case and change the example.
However, there actually *is* a significant semantic discrepancy in the heapq case, which is that py_heapq is duck-typed, while c_heapq is not:
from test.support import import_fresh_module c_heapq = import_fresh_module('heapq', fresh=['_heapq']) py_heapq = import_fresh_module('heapq', blocked=['_heapq']) from collections import UserList class Seq(UserList): pass ... c_heapq.heappop(UserList()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: heap argument must be a list py_heapq.heappop(UserList()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py3k/Lib/heapq.py", line 140, in heappop lastelt = heap.pop() # raises appropriate IndexError if heap is empty File "/home/ncoghlan/devel/py3k/Lib/collections/__init__.py", line 848, in pop def pop(self, i=-1): return self.data.pop(i) IndexError: pop from empty list
Cheers, Nick.
P.S. The reason I was bugging Guido to answer the TypeError vs AttributeError question in the first place was to find out whether or not I needed to get rid of the following gross inconsistency in the behaviour of the with statement relative to other language constructs:
1() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object is not callable with 1: pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'int' object has no attribute '__exit__'
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Apr 5, 2011 at 05:01, Nick Coghlan
On Tue, Apr 5, 2011 at 9:46 AM, Brett Cannon
wrote: try: c_heapq.heappop(Spam()) except TypeError: # "heap argument must be a list" pass
try: py_heapq.heappop(Spam()) except AttributeError: # "'Foo' object has no attribute 'pop'" pass
This kind of divergence is a problem for users as they unwittingly write code that is CPython-specific. This is also an issue for other VM teams as they have to deal with bug reports from users thinking that they incorrectly implemented the module when in fact it was caused by an untested case.
While I agree with the PEP in principle, I disagree with the way this example is written. Guido has stated in the past that code simply *cannot* rely on TypeError being consistently thrown instead of AttributeError (or vice-versa) when it comes to duck-typing. Code that cares which of the two is thrown is wrong.
However, there actually *is* a significant semantic discrepancy in the heapq case, which is that py_heapq is duck-typed, while c_heapq is not:
That's true. I will re-word it to point that out. The example code still shows it, I just didn't explicitly state that in the example. -Brett
from test.support import import_fresh_module c_heapq = import_fresh_module('heapq', fresh=['_heapq']) py_heapq = import_fresh_module('heapq', blocked=['_heapq']) from collections import UserList class Seq(UserList): pass ... c_heapq.heappop(UserList()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: heap argument must be a list py_heapq.heappop(UserList()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py3k/Lib/heapq.py", line 140, in heappop lastelt = heap.pop() # raises appropriate IndexError if heap is empty File "/home/ncoghlan/devel/py3k/Lib/collections/__init__.py", line 848, in pop def pop(self, i=-1): return self.data.pop(i) IndexError: pop from empty list
Cheers, Nick.
P.S. The reason I was bugging Guido to answer the TypeError vs AttributeError question in the first place was to find out whether or not I needed to get rid of the following gross inconsistency in the behaviour of the with statement relative to other language constructs:
1() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object is not callable with 1: pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'int' object has no attribute '__exit__'
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Brett Cannon, 06.04.2011 19:40:
On Tue, Apr 5, 2011 at 05:01, Nick Coghlan wrote:
However, there actually *is* a significant semantic discrepancy in the heapq case, which is that py_heapq is duck-typed, while c_heapq is not:
TypeError: heap argument must be a list
That's true. I will re-word it to point that out. The example code still shows it, I just didn't explicitly state that in the example.
Assuming there always is an "equivalent" Python implementation anyway, what about using that as a fallback for input types that the C implementation cannot deal with? Or would it be a larger surprise for users if the code ran slower when passing in a custom type than if it throws an exception instead? Stefan
On Thu, Apr 7, 2011 at 3:15 PM, Stefan Behnel
Assuming there always is an "equivalent" Python implementation anyway, what about using that as a fallback for input types that the C implementation cannot deal with?
Or would it be a larger surprise for users if the code ran slower when passing in a custom type than if it throws an exception instead?
It often isn't practical - the internal structures of the two don't necessarily play nicely together. It's an interesting idea for heapq in particular, though. (The C module fairly could easily alias the Python versions with underscore prefixes, then fallback to those instead of raising an error if PyList_CheckExact fails). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
[Brett]
This PEP requires that in these instances that both the Python and C code must be semantically identical
Are you talking about the guaranteed semantics promised by the docs or are you talking about every possible implementation detail? ISTM that even with pure python code, we get problems with people relying on implementation specific details. * Two functions accept a sequence, but one accesses it using __len__ and __getitem__ while the other uses __iter__. (This is like the Spam example in the PEP). * Given pure python library code like: if x < y: ... I've seen people only implement __lt__ but not __gt__, making it impossible to make even minor adjustments to the code such as: if y > x: ... * We also suffer from inconsistency in choice of exceptions (i.e. overly large sequence indices raising either an IndexError, OverflowError, or ValueError). With C code, I wonder if certain implementation differences go with the territory: * Concurrency issues are a common semantic difference. For example, deque.pop() is atomic because the C code holds the GIL but a pure python equivalent would have to use locks to achieve same effect (and even then might introduce liveness or deadlock issues). * Heapq is one of the rare examples of purely algorithmic code. Much of the code in CPython does accesses libraries (i.e. the math module), interfaces with the OS, access binary data structures, links to third-party tools (sqlite3 and Tkinter) or does something else that doesn't have pure python equivalents (at least without using C types). * The C API for parsing argument tuples and keywords do not readily parallel the way the same are written in Python. And with iterators, the argument checking in the C versions tends to happen when the iterator is instantiated, but code written with pure python generators doesn't have its setup and checking section run until next() is called the first time. * We've had a very difficult time bridging the gulf between python's infinite precision numbers and and C's fixed width numbers (for example, it took years to get range() to handle values greater than a word size). * C code tends to be written in a way that takes advantage of that language's features instead of in a form that is a direct translation of pure python. For example, I think the work being done on a C implementation of decimal has vastly different internal structures and it would be a huge challenge to make it semantically identical to the pure python version with respect to its implementation details. Likewise, a worthwhile C implementation of OrderedDict can only achieve massive space savings by having majorly different implementation details. Instead of expressing the wishful thought that C versions and pure Python versions are semantically identical with respect to implementation details, I would like to see more thought put into specific limitations on C coding techniques and general agreement on which implementation specific details should be guaranteed: * I would like to see a restriction on the use of the concrete C API such that it is *only* used when a exact type match has been found or created (i.e. if someone writes Py_ListNew(), then it is okay to use Py_ListSetItem()). See http://bugs.python.org/issue10977 for a discussion of what can go wrong. The original json C was an example of code that used the concrete C API is a way that precluded pure python subclasses of list and dict. * I would like to see better consistency on when to use OverflowError vs ValueError vs IndexError. * There should also be a discussion of whether the possible exceptions should be a guaranteed part of the API as it is in Java. Because there were no guarantees (i.e. ord(x) can raise this, that, and the other), people tend to run an experiment and then rely on whatever C Python happens to do. * There should be a discussion on when it is okay for a C implementation to handle only a value range that fits in a word. * When there is C code, when is it okay for a user to assume atomic access? Even with pure python code, we're not always consistent about it (i.e. OrderedDict implementation is not threadsafe but the LRU_Cache is). * There should be some agreement that people implementing rich comparisons will implement all six operations so that client code doesn't become dependent on (x<y versus y>x). For example, we had to add special-case logic to heapq years ago because Twisted implemented a task object that defined __le__ instead of __lt__, so it was usable only with an older version of heapq but not with min, sort, etc. A good PEP should address these issues head-on. Just saying that C and python code have to be semantically identical in all implementation details doesn't really address the issue. [Brett]
(sorry, Raymond, for picking on heapq, but is was what bit the PyPy people most recently =).
No worries, it wasn't even my code. Someone donated it. The was a discusion on python-dev and collective agreement to allow it to have semantic differences that would let it run faster. IIRC, the final call was made by Uncle Timmy. That being said, I would like to see a broader set of examples rather rather than extrapolating from a single piece 7+ year-old code. It is purely algorithmic, so it really just represents the simplest case. It would be much more interesting to discuss something what should be done with future C implementations for threading, decimal, OrderedDict, or some existing non-trivial C accelerators like that for JSON or XML. Brett, thanks for bringing the issue up. I've been bugged for a good while about issues like overbroad use of the concrete C API. Raymond
On 4/5/2011 3:57 PM, Raymond Hettinger wrote:
[Brett]
This PEP requires that in these instances that both the Python and C code must be semantically identical
Are you talking about the guaranteed semantics promised by the docs or are you talking about every possible implementation detail?
I personally would limit the guarantee to what the docs promise. That is all people should expect anyway if the Python code were executed by some other implementation, or by someone else's system-coded version, or even a different version of CPython. This assumes that the docs have reasonably complete specifications. The was improved in 3.2 and should improve further as system-code implementers find more holes. Exceptions are a bit of a gray area. The docs are quite uneven about specifying exceptions. They sometimes do, sometimes do not, even for similar functions. This should be another PEP though. -- Terry Jan Reedy
On Tue, 5 Apr 2011 12:57:13 -0700
Raymond Hettinger
* I would like to see a restriction on the use of the concrete C API such that it is *only* used when a exact type match has been found or created (i.e. if someone writes Py_ListNew(), then it is okay to use Py_ListSetItem()).
That should be qualified. For example, not being able to use PyUnicode_AS_STRING in some performance-critical code (such as the io lib) would be a large impediment. Regards Antoine.
On Wed, Apr 6, 2011 at 11:59 PM, Antoine Pitrou
On Tue, 5 Apr 2011 12:57:13 -0700 Raymond Hettinger
wrote: * I would like to see a restriction on the use of the concrete C API such that it is *only* used when a exact type match has been found or created (i.e. if someone writes Py_ListNew(), then it is okay to use Py_ListSetItem()).
That should be qualified. For example, not being able to use PyUnicode_AS_STRING in some performance-critical code (such as the io lib) would be a large impediment.
Str/unicode/bytes are really an exception to most rules when it comes to duck-typing. There's so much code out there that only works with "real" strings, nobody is surprised when an API doesn't accept string look-alikes. (There aren't any standard ABCs for those interfaces, and I haven't really encountered anyone clamouring for them, either). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Apr 6, 2011 at 5:57 AM, Raymond Hettinger
[Brett]
This PEP requires that in these instances that both the Python and C code must be semantically identical
Are you talking about the guaranteed semantics promised by the docs or are you talking about every possible implementation detail?
ISTM that even with pure python code, we get problems with people relying on implementation specific details.
Indeed. Argument handling is certainly a tricky one - getting positional only arguments requires a bit of a hack in pure Python code (accepting *args and unpacking the arguments manually), but it comes reasonably naturally when parsing arguments directly using the C API. As another example where these questions will arise (this time going the other way) is that I would like to see a pure-Python version of partial added back in to functools, with the C version becoming an accelerated override for it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Apr 6, 2011, at 10:08 AM, Nick Coghlan wrote:
On Wed, Apr 6, 2011 at 5:57 AM, Raymond Hettinger
wrote: [Brett]
This PEP requires that in these instances that both the Python and C code must be semantically identical
Are you talking about the guaranteed semantics promised by the docs or are you talking about every possible implementation detail?
ISTM that even with pure python code, we get problems with people relying on implementation specific details.
Indeed.
Argument handling is certainly a tricky one - getting positional only arguments requires a bit of a hack in pure Python code (accepting *args and unpacking the arguments manually), but it comes reasonably naturally when parsing arguments directly using the C API.
Perhaps the argument handling for C functions ought to be enhanced to work like python's argument handling, instead of trying to hack it the other way around? James
On Thu, Apr 7, 2011 at 1:03 AM, James Y Knight
Perhaps the argument handling for C functions ought to be enhanced to work like python's argument handling, instead of trying to hack it the other way around?
Oh, definitely. It is just that you pretty much have to use the *args hack when providing Python versions of C functions that accept both positional-only arguments and arbitrary keyword arguments. For "ordinary" calls, simply switching to PyArg_ParseTupleAndKeywords over other alternatives basically deals with the problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
James Y Knight, 06.04.2011 17:03:
On Apr 6, 2011, at 10:08 AM, Nick Coghlan wrote:
Argument handling is certainly a tricky one - getting positional only arguments requires a bit of a hack in pure Python code (accepting *args and unpacking the arguments manually), but it comes reasonably naturally when parsing arguments directly using the C API.
Perhaps the argument handling for C functions ought to be enhanced to work like python's argument handling, instead of trying to hack it the other way around?
FWIW, Cython implemented functions have full Python 3 semantics for argument unpacking but the generated code is usually faster (and sometimes much faster) than the commonly used C-API function calls because it is tightly adapted to the typed function signature. Stefan
On 05/04/2011 20:57, Raymond Hettinger wrote:
[snip...] [Brett]
(sorry, Raymond, for picking on heapq, but is was what bit the PyPy people most recently =). No worries, it wasn't even my code. Someone donated it. The was a discusion on python-dev and collective agreement to allow it to have semantic differences that would let it run faster. IIRC, the final call was made by Uncle Timmy.
The major problem that pypy had with heapq, aside from semantic differences, was (is?) that if you run the tests against the pure-Python version (without the C accelerator) then tests *fail*. This means they have to patch the CPython tests in order to be able to use the pure Python version. Ensuring that tests run against both (even if there are some unavoidable differences like exception types with the tests allowing for both or skipping some tests) would at least prevent this happening. All the best, Michael
That being said, I would like to see a broader set of examples rather rather than extrapolating from a single piece 7+ year-old code. It is purely algorithmic, so it really just represents the simplest case. It would be much more interesting to discuss something what should be done with future C implementations for threading, decimal, OrderedDict, or some existing non-trivial C accelerators like that for JSON or XML.
Brett, thanks for bringing the issue up. I've been bugged for a good while about issues like overbroad use of the concrete C API.
Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
On Wed, 06 Apr 2011 15:17:05 +0100
Michael Foord
On 05/04/2011 20:57, Raymond Hettinger wrote:
[snip...] [Brett]
(sorry, Raymond, for picking on heapq, but is was what bit the PyPy people most recently =). No worries, it wasn't even my code. Someone donated it. The was a discusion on python-dev and collective agreement to allow it to have semantic differences that would let it run faster. IIRC, the final call was made by Uncle Timmy.
The major problem that pypy had with heapq, aside from semantic differences, was (is?) that if you run the tests against the pure-Python version (without the C accelerator) then tests *fail*. This means they have to patch the CPython tests in order to be able to use the pure Python version.
Was the tests patch contributed back? Regards Antoine.
No worries, it wasn't even my code. Someone donated it. The was a discusion on python-dev and collective agreement to allow it to have semantic differences that would let it run faster. IIRC, the final call was made by Uncle Timmy.
The bug link is here: http://bugs.python.org/issue3051 I think this PEP is precisely targeting this: "I saw no need to complicate the pure python code for this." if you complicate the C code for this, then please as well complicate python code for this since it's breaking stuff. And this: "FWIW, the C code is not guaranteed to be exactly the same in terms of implementation details, only the published API should be the same. And, for this module, a decision was made for the C code to support only lists eventhough the pure python version supports any sequence." The idea of the PEP is for C code to be guaranteed to be the same as Python where it matters to people. Cheers, fijal
On Apr 6, 2011, at 10:24 AM, Maciej Fijalkowski wrote:
"I saw no need to complicate the pure python code for this."
if you complicate the C code for this, then please as well complicate python code for this since it's breaking stuff.
Do you really need a PEP for this one extraordinary and weird case? The code is long since gone (never in 3.x). If you disagreed with the closing of the bug report, just re-open it and a patch can go into a 2.7 point release. The downside is that it would not be a pretty piece of python.
And this:
"FWIW, the C code is not guaranteed to be exactly the same in terms of implementation details, only the published API should be the same. And, for this module, a decision was made for the C code to support only lists eventhough the pure python version supports any sequence."
The idea of the PEP is for C code to be guaranteed to be the same as Python where it matters to people.
That is a good goal. Unfortunately, people can choose to rely on all manner of implementation details (whether in C or pure Python). If we want a pure python version of map() for example, the straight-forward way doesn't work very well because "map(chr, 3)" raises a TypeError right away in C code, but a python version using a generator wouldn't raise until next() is called. Would this be considered a detail that matters to people? If so, it means that all the pure python equivalents for itertools would be have to be written as classes, making them hard to read and making them run slowly on all implementations except for PyPy. The original of the bug report you mentioned arose because a major tool relied on the pure python heapq code comparing "not b <= a" rather than the equivalent "a < b". So this was an implementation detail that mattered to someone, but it went *far* beyond any guaranteed behaviors. Tracebacks are another area where C code and pure python code can't be identical. This may or may not matter to someone. The example in the PEP focused on which particular exception, a TypeError or AttributeError, was raised in response to an oddly constructed Spam() class. I don't know that that was forseeable or that there would have been a reasonable way to eliminate the difference. It does sound like the difference mattered to someone though. C code tends to use direct internal calls such as Py_SIZE(obj) rather than doing a lookup using obj.__len__(). This is often a detail that matters to people because it prevents them from hooking the call to __len__. The C code has to take this approach in order to protect its internal invariants and not crash. If the pure python code tried to emulate this, then every call to len(self) would need to be replaced by self.__internal_len() where __internal_len is the real length method and __len__ is made equal to it. In C to Python translations, do we want locks to be required so that atomicity behaviors are matched? That property likely matters to some users. ISTM that every person who sets out to translate code from C to Python or vice versa is already trying their best to make them behave as similarly as possible. That is always the goal. However, the PEP seems to be raising the bar by insisting on the code being functionally identical. I think we should make some decisions about what that really means; otherwise, every piece of code will be in violation of the PEP for someone choosing to rely on an implementation detail that isn't the same. In my opinion, a good outcome of this discussion would be a list of implementation details that we want to guarantee and ones that we explicitly say that are allowed to vary. I would also like to see strong guidance on the use of the concrete C API which can make it impossible for client code to use subclasses of builtin types (issue 10977). That is another area where differences will arise that will matter to some users. Raymond P.S. It would be great if the PEP were to supply a complete, real-word example of code that is considered to be identical. A pure python version of map() could serve as a good example, trying to make it model all the C behaviors as exactly as possible (argument handling, choice of exceptions, length estimation and presizing, etc).
On 4/6/2011 1:24 PM, Maciej Fijalkowski wrote:
No worries, it wasn't even my code. Someone donated it. The was a discusion on python-dev and collective agreement to allow it to have semantic differences that would let it run faster. IIRC, the final call was made by Uncle Timmy. ... And, for this module, a decision was made for the C code to support only lists eventhough the pure python version supports any sequence."
I believe that at the time of that decision, the Python code was only intended for humans, like the Python (near) equivalents in the itertools docs to C-coded itertool functions. Now that we are aiming to have stdlib Python code be a reference implementation for all interpreters, that decision should be revisited. Either the C code should be generalized to sequences or the Python code specialized to lists, making sure the doc matches either way.
The idea of the PEP is for C code to be guaranteed to be the same as Python where it matters to people.
-- Terry Jan Reedy
On 4/6/2011 2:54 PM, Terry Reedy wrote:
I believe that at the time of that decision, the Python [heapq] code was only intended for humans, like the Python (near) equivalents in the itertools docs to C-coded itertool functions. Now that we are aiming to have stdlib Python code be a reference implementation for all interpreters, that decision should be revisited.
OK so far.
Either the C code should be generalized to sequences or the Python code specialized to lists, making sure the doc matches either way.
After rereading the heapq doc and .py file and thinking some more, I retract this statement for the following reasons. 1. The heapq doc clearly states that a list is required. It leaves the behavior for other types undefined. Let it be so. 2. Both _heapq.c (or its actual name) and heapq.py meet (I presume) the documented requirements and pass (or would pass) a complete test suite based on using lists as heaps. In that regard, both are conformant and should be considered 'equivalent'. 3. _heapq.c is clearly optimized for speed. It allows a list subclass as input and will heapify such, but it ignores a custom __getitem__. My informal test on the result of random.shuffle(list(range(9999999) shows that heapify is over 10x as fast as .sort(). Let it be so. 4. When I suggested changing heapq.py, I had forgetten that heap.py defined several functions rather than a wrapper class with methods. I was thinking of putting a type check in .__init__, where it would be applied once per heap (and possibly bypassed), and could easily be removed. Instead every function would require a type check for every call. This would be too obnoxious to me. I love duck typing and held my nose a bit when suggesting a one-time type check. 5. Python already has an "extra's allowed" principle. In other words, an implementation does not have to bother to enforce documented restrictions. For one example, Python 2 manuals restrict identifiers to ascii letters. CPython 2 (at least in recent versions) actually allows extended ascii letters, as in latin-1. For another, namespaces (globals and attribute namespaces), by their name, only need to map identifiers to objects. However, CPython uses general dicts rather than specialized string dicts with validity checks. People have exploited both loopholes. But those who have should not complain to us if such code fails on a different implementation that adheres to the doc. I think the Language and Library references should start with something a bit more specific than at present: "The Python x.y Language and Library References define the Python x.y language, its builtin objects, and standard library. Code written to these docs should run on any implementation that includes the features used. Code that exploits or depends on any implementation-specific feature or behavior may not be portable." _x.c and x.py are separate implementations of module x. I think they should be subject to the same disclaimer. Therefore, I currently think that the only change needed for heapq (assuming both versions pass complete tests as per the doc) is an explanation at the top of heapq.py that goes something like this: "Heapq.py is a reference implementation of the heapq module for both humans and implementations that do not have an accelerated version. For CPython, most of the functions are replaced by much faster C-coded versions. Heapq is documented to required a python list as input to the heap functions. The C functions enforce this restriction. The Python versions do not and should work with any mutable random-access sequence. Should you wish to run the Python code with CPython, copy this file, give it a new name, delete the following lines: try: from _heapq import * except ImportError: pass make any other changes you wish, and do not expect the result to be portable." -- Terry Jan Reedy
On Tue, Apr 5, 2011 at 12:57, Raymond Hettinger wrote: [Brett] This PEP requires that in these instances that both
the Python and C code must be semantically identical Are you talking about the guaranteed semantics
promised by the docs or are you talking about
every possible implementation detail? ISTM that even with pure python code, we get problems
with people relying on implementation specific details. * Two functions accept a sequence, but one accesses
it using __len__ and __getitem__ while the other
uses __iter__. (This is like the Spam example
in the PEP). That's a consistency problem in all of our C code and not unique to Python/C
modules. * Given pure python library code like:
if x < y: ...
I've seen people only implement __lt__
but not __gt__, making it impossible to
make even minor adjustments to the code such as:
if y > x: ... How is that an issue here? Because someone was lazy in the C code but not
the Python code? That is an issue as that is a difference in what methods
are provided. * We also suffer from inconsistency in choice of
exceptions (i.e. overly large sequence indices
raising either an IndexError, OverflowError, or
ValueError). Once again, a general issue in our C code and not special to this PEP. With C code, I wonder if certain implementation
differences go with the territory: * Concurrency issues are a common semantic difference.
For example, deque.pop() is atomic because the C
code holds the GIL but a pure python equivalent
would have to use locks to achieve same effect
(and even then might introduce liveness or deadlock
issues). That's just a CPython-specific issue that will always be tough to work
around. Obviously we can do the best we can but since the other VMs don't
necessarily have the same concurrency guarantees per Python expression it is
near impossible to define. * Heapq is one of the rare examples of purely
algorithmic code. Much of the code in CPython
does accesses libraries (i.e. the math module),
interfaces with the OS, access binary data
structures, links to third-party tools (sqlite3
and Tkinter) or does something else that doesn't
have pure python equivalents (at least without
using C types). Those C modules are outside the scope of the PEP. * The C API for parsing argument tuples and keywords
do not readily parallel the way the same are
written in Python. And with iterators, the argument
checking in the C versions tends to happen when the
iterator is instantiated, but code written with
pure python generators doesn't have its setup and
checking section run until next() is called the
first time. * We've had a very difficult time bridging the gulf
between python's infinite precision numbers and
and C's fixed width numbers (for example, it took
years to get range() to handle values greater than
a word size). I don't expect that to be an issue as this is a limitation in CPython that
the other VMs never run into. If anything it is puts the other VMs at an
advantage for us relying on C code. * C code tends to be written in a way that takes
advantage of that language's features instead of
in a form that is a direct translation of pure
python. For example, I think the work being done
on a C implementation of decimal has vastly different
internal structures and it would be a huge challenge
to make it semantically identical to the pure python
version with respect to its implementation details.
Likewise, a worthwhile C implementation of OrderedDict
can only achieve massive space savings by having
majorly different implementation details. Instead of expressing the wishful thought that C
versions and pure Python versions are semantically
identical with respect to implementation details,
I would like to see more thought put into specific
limitations on C coding techniques and general
agreement on which implementation specific details
should be guaranteed: * I would like to see a restriction on the use of
the concrete C API such that it is *only* used
when a exact type match has been found or created
(i.e. if someone writes Py_ListNew(), then it
is okay to use Py_ListSetItem()). See
http://bugs.python.org/issue10977 for a discussion
of what can go wrong. The original json C
was an example of code that used the concrete
C API is a way that precluded pure python
subclasses of list and dict. That's a general coding policy that is not special to this PEP. * I would like to see better consistency on when to
use OverflowError vs ValueError vs IndexError. Once again, not specific to this PEP. * There should also be a discussion of whether the
possible exceptions should be a guaranteed part
of the API as it is in Java. Because there were
no guarantees (i.e. ord(x) can raise this, that,
and the other), people tend to run an experiment
and then rely on whatever C Python happens to do. Still not part of this PEP and I am going to stop saying this. =) * There should be a discussion on when it is okay
for a C implementation to handle only a value
range that fits in a word. * When there is C code, when is it okay for a user
to assume atomic access? Even with pure python
code, we're not always consistent about it
(i.e. OrderedDict implementation is not threadsafe
but the LRU_Cache is). * There should be some agreement that people
implementing rich comparisons will implement
all six operations so that client code doesn't
become dependent on (x<y versus y>x). For
example, we had to add special-case logic to
heapq years ago because Twisted implemented
a task object that defined __le__ instead of
__lt__, so it was usable only with an older
version of heapq but not with min, sort, etc. A good PEP should address these issues head-on.
Just saying that C and python code have to
be semantically identical in all implementation
details doesn't really address the issue. [Brett] (sorry, Raymond, for picking on heapq, but is
was what bit the PyPy people most recently =). No worries, it wasn't even my code. Someone
donated it. The was a discusion on python-dev
and collective agreement to allow it to have
semantic differences that would let it run faster.
IIRC, the final call was made by Uncle Timmy. That being said, I would like to see a broader set
of examples rather rather than extrapolating from
a single piece 7+ year-old code. It is purely
algorithmic, so it really just represents the
simplest case. It would be much more interesting
to discuss something what should be done with
future C implementations for threading, decimal,
OrderedDict, or some existing non-trivial C
accelerators like that for JSON or XML. This is a known issue and is a priori something that needs to be worked out.
If one of the other VM teams want to dig up some more examples they can, but
I'm not going to put them through that for something that is so obviously
something we want written down in a PEP. Brett, thanks for bringing the issue up.
I've been bugged for a good while about
issues like overbroad use of the concrete C API. Since people are taking my "semantically identical" point too strongly for
what I mean (there is a reason I said "except in cases
where implementation details of a VM prevents [semantic equivalency]
entirely"), how about we change the requirement that C acceleration code
must pass the same test suite (sans C specific issues such as refcount tests
or word size) and adhere to the documented semantics the same? It should get
us the same result without ruffling so many feathers. And if the other VMs
find an inconsistency they can add a proper test and then we fix the code
(as would be the case regardless). And in instances where it is simply not
possible because of C limitations the test won't get written since the test
will never pass.
Brett Cannon
* We also suffer from inconsistency in choice of exceptions (i.e. overly large sequence indices raising either an IndexError, OverflowError, or ValueError).
Once again, a general issue in our C code and not special to this PEP.
Not only in the C code. I get the impression that exceptions are sometimes handled somewhat arbitrarily. Example: decimal.py encodes the rounding mode as strings. For a simple invalid argument we have the following three cases: # I would prefer a ValueError:
Decimal("1").quantize(Decimal('2'), "this is not a rounding mode") Decimal('1')
# I would prefer a ValueError:
Decimal("1.11111111111").quantize(Decimal('1e100000'), "this is not a rounding mode") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/stefan/pydev/cpython/Lib/decimal.py", line 2494, in quantize ans = self._rescale(exp._exp, rounding) File "/home/stefan/pydev/cpython/Lib/decimal.py", line 2557, in _rescale this_function = getattr(self, self._pick_rounding_function[rounding]) KeyError: 'this is not a rounding mode'
# I would prefer a TypeError:
Decimal("1.23456789").quantize(Decimal('1e-100000'), ROUND_UP, "this is not a context") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/stefan/pydev/cpython/Lib/decimal.py", line 2478, in quantize if not (context.Etiny() <= exp._exp <= context.Emax): AttributeError: 'str' object has no attribute 'Etiny'
cdecimal naturally encodes the rounding mode as integers and raises a TypeError in all three cases. The context in cdecimal is a custom type that translates the flag dictionaries to simple C integers. This is extremely fast since the slow dictionaries are only updated on actual accesses. In normal usage, there is no visible difference to the decimal.py semantics, but there is no way that one could use a custom context (why would one anyway?). I think Raymond is right that these issues need to be addressed. Other C modules will have similar discrepancies to their Python counterparts. A start would be: 1) Module constants (like ROUND_UP), should be treated as opaque. If a user relies on a specific type, he is on his own. 2) If it is not expected that custom types will used for a certain data structure, then a fixed type can be used. For cdecimal, the context actually falls under the recently added subset clause of the PEP, but 2) might be interesting for other modules. Stefan Krah
On Apr 6, 2011, at 10:39 AM, Brett Cannon wrote:
Since people are taking my "semantically identical" point too strongly for what I mean (there is a reason I said "except in cases where implementation details of a VM prevents [semantic equivalency] entirely"), how about we change the requirement that C acceleration code must pass the same test suite (sans C specific issues such as refcount tests or word size) and adhere to the documented semantics the same? It should get us the same result without ruffling so many feathers. And if the other VMs find an inconsistency they can add a proper test and then we fix the code (as would be the case regardless). And in instances where it is simply not possible because of C limitations the test won't get written since the test will never pass.
Does the whole PEP just boil down to "if a test is C specific, it should be marked as such"? Anyone setting out to create equivalent code is already trying to making it as functionally equivalent as possible. At some point, we should help implementers by thinking out what kinds of implementation details are guaranteed. Raymond P.S. We also need a PEP 8 entry or somesuch giving specific advice about rich comparisons (i.e. never just supply one ordering method, always implement all six); otherwise, rich comparisons will be a never ending source of headaches.
On Wed, Apr 6, 2011 at 12:45, Raymond Hettinger wrote: Since people are taking my "semantically identical" point too strongly
for what I mean (there is a reason I said "except in cases
where implementation details of a VM prevents [semantic equivalency]
entirely"), how about we change the requirement that C acceleration code
must pass the same test suite (sans C specific issues such as refcount tests
or word size) and adhere to the documented semantics the same? It should get
us the same result without ruffling so many feathers. And if the other VMs
find an inconsistency they can add a proper test and then we fix the code
(as would be the case regardless). And in instances where it is simply not On Apr 6, 2011, at 10:39 AM, Brett Cannon wrote:
possible because of C limitations the test won't get written since the test
will never pass. Does the whole PEP just boil down to "if a test is C specific, it should be
marked as such"? How about the test suite needs to have 100% test coverage (or as close as
possible) on the pure Python version? That will guarantee that the C code
which passes that level of test detail is as semantically equivalent as
possible. It also allows the other VMs to write their own acceleration code
without falling into the same trap as CPython can.
There is also the part of the PEP strongly stating that any module that gets
added with no pure Python equivalent will be considered CPython-only and you
better have a damned good reason for it to be only in C from this point
forward. Anyone setting out to create equivalent code is already trying to making it
as functionally equivalent as possible. At some point, we should help
implementers by thinking out what kinds of implementation details are
guaranteed. I suspect 100% test coverage will be as good of a metric as any without
bogging ourselves down with every minute detail of C code that could change
as time goes on.
If we want a more thorough definition of what C code should be trying to do
to be as compatible with Python practices should be in a doc in the devguide
rather than this PEP. Raymond P.S. We also need a PEP 8 entry or somesuch giving specific advice about
rich comparisons (i.e. never just supply one ordering method, always
implement all six); otherwise, rich comparisons will be a never ending
source of headaches. Fine by me, but I will let you handle that one.
On Wed, 6 Apr 2011 13:22:09 -0700
Brett Cannon
On Wed, Apr 6, 2011 at 12:45, Raymond Hettinger
wrote:
Since people are taking my "semantically identical" point too strongly for what I mean (there is a reason I said "except in cases where implementation details of a VM prevents [semantic equivalency] entirely"), how about we change the requirement that C acceleration code must pass the same test suite (sans C specific issues such as refcount tests or word size) and adhere to the documented semantics the same? It should get us the same result without ruffling so many feathers. And if the other VMs find an inconsistency they can add a proper test and then we fix the code (as would be the case regardless). And in instances where it is simply not
On Apr 6, 2011, at 10:39 AM, Brett Cannon wrote: possible because of C limitations the test won't get written since the test will never pass.
Does the whole PEP just boil down to "if a test is C specific, it should be marked as such"?
How about the test suite needs to have 100% test coverage (or as close as possible) on the pure Python version?
Let's say "as good coverage as the C code has", instead ;) Regards Antoine.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/06/2011 04:37 PM, Antoine Pitrou wrote:
On Wed, 6 Apr 2011 13:22:09 -0700 Brett Cannon
wrote: On Wed, Apr 6, 2011 at 12:45, Raymond Hettinger
wrote:
Since people are taking my "semantically identical" point too strongly for what I mean (there is a reason I said "except in cases where implementation details of a VM prevents [semantic equivalency] entirely"), how about we change the requirement that C acceleration code must pass the same test suite (sans C specific issues such as refcount tests or word size) and adhere to the documented semantics the same? It should get us the same result without ruffling so many feathers. And if the other VMs find an inconsistency they can add a proper test and then we fix the code (as would be the case regardless). And in instances where it is simply not
On Apr 6, 2011, at 10:39 AM, Brett Cannon wrote: possible because of C limitations the test won't get written since the test will never pass.
Does the whole PEP just boil down to "if a test is C specific, it should be marked as such"?
How about the test suite needs to have 100% test coverage (or as close as possible) on the pure Python version?
Let's say "as good coverage as the C code has", instead ;)
The point is to require that the *Python* version be the "reference implementation", which means that the tests should be fully covering it (especially for any non-grandfathered module). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2c48QACgkQ+gerLs4ltQ4p2ACgjds89LnzLnSEZOykwZKzqFVn VVAAn10q1x74JOW2gi/DlYDgf9hkRCuv =ee3b -----END PGP SIGNATURE-----
On Wed, 06 Apr 2011 18:05:57 -0400, Tres Seaver
On 04/06/2011 04:37 PM, Antoine Pitrou wrote:
On Wed, 6 Apr 2011 13:22:09 -0700 Brett Cannon
wrote: How about the test suite needs to have 100% test coverage (or as close as possible) on the pure Python version?
Let's say "as good coverage as the C code has", instead ;)
The point is to require that the *Python* version be the "reference implementation", which means that the tests should be fully covering it (especially for any non-grandfathered module).
There are two slightly different requirements covered by these two suggested rules. The Python one says "any test the Python package passes the C version should also pass, and let's make sure we test all of the Python code". The C one says "any test that the C code passes the Python code should also pass". These are *almost* the same rule, but not quite. Brett's point in asking for 100% coverage of the Python code is to make sure the C implementation covers the same ground as the Python code. Antoine's point in asking that the Python tests be at least as good as the C tests is to make sure that the Python code covers the same ground as the C code. The former is most important for modules that are getting new accelerator code, the latter for existing modules that already have accelerators or are newly acquiring Python versions. The PEP actually already contains the combined rule: both the C and the Python version must pass the *same* test suite (unless there are virtual machine issues that simply can't be worked around). I think the thing that we are talking about adding to the PEP is that there should be no untested features in *either* the Python or the C version, insofar as we can make that happen (that is, we are testing also that the feature sets are the same). And that passing that comprehensive test suite is the definition of compliance with the PEP, not abstract arguments about semantics. (We can argue about the semantics when we discuss individual tests :) 100% branch coverage as measured by coverage.py is one good place to start for building such a comprehensive test suite. Existing tests for C versions getting (or newly testing) Python code is another. Bug reports from alternate VMs will presumably fill out the remainder. -- R. David Murray http://www.bitdance.com PS: testing that Python code handles subclasses and duck typing is by no means wasted effort; I've some bugs in the email package using such tests, and it is pure Python.
Brett> How about the test suite needs to have 100% test coverage (or as Brett> close as possible) on the pure Python version? Works for me, but you will have to define what "100%" is fairly clearly. 100% of the lines get executed? All the branches are taken? Under what circumstances might the 100% rule be relaxed? Skip
On Apr 6, 2011, at 4:44 PM, skip@pobox.com wrote:
Brett> How about the test suite needs to have 100% test coverage (or as Brett> close as possible) on the pure Python version?
Works for me, but you will have to define what "100%" is fairly clearly. 100% of the lines get executed? All the branches are taken? Under what circumstances might the 100% rule be relaxed?
And...does that include all branches taken within the interpreter too? :) E.g. check whether all possible exceptions are thrown in all possible places an exception could be thrown? (As per the exception compatibility subthread) And what about all the possible crazy stuff you could do in callbacks back to user code (e.g. mutating arguments passed to the initial function, or installing a trace hook or...)? Does use of the function as a class attribute need to be covered? (see previous discussion on differences in behavior due to descriptors). Etcetc. I'd love it if CPython C modules acted equivalently to python code, but there is almost an endless supply of differences...100% test coverage of the behavior seems completely infeasible if interpreted strictly; some explicit subset of all possible behavior needs to be defined for what users cannot reasonably depend on. (sys.settrace almost certainly belonging on that list :).) James
On Apr 6, 2011, at 1:22 PM, Brett Cannon wrote:
On Wed, Apr 6, 2011 at 12:45, Raymond Hettinger
wrote: On Apr 6, 2011, at 10:39 AM, Brett Cannon wrote:
Since people are taking my "semantically identical" point too strongly for what I mean (there is a reason I said "except in cases where implementation details of a VM prevents [semantic equivalency] entirely"), how about we change the requirement that C acceleration code must pass the same test suite (sans C specific issues such as refcount tests or word size) and adhere to the documented semantics the same? It should get us the same result without ruffling so many feathers. And if the other VMs find an inconsistency they can add a proper test and then we fix the code (as would be the case regardless). And in instances where it is simply not possible because of C limitations the test won't get written since the test will never pass.
Does the whole PEP just boil down to "if a test is C specific, it should be marked as such"?
How about the test suite needs to have 100% test coverage (or as close as possible) on the pure Python version? That will guarantee that the C code which passes that level of test detail is as semantically equivalent as possible. It also allows the other VMs to write their own acceleration code without falling into the same trap as CPython can.
Sounds good.
There is also the part of the PEP strongly stating that any module that gets added with no pure Python equivalent will be considered CPython-only and you better have a damned good reason for it to be only in C from this point forward.
That seems reasonable for purely algorithmic modules. I presume if an xz compressor gets added, there won't be a requirement that it be coded in Python ;-) Also, I'm not sure the current wording of the PEP makes it clear that this is a going-forward requirement. We don't want to set off an avalanche of new devs rewriting all the current C components (struct, math, cmath, bz2, defaultdict, arraymodule, sha1, mersenne twister, etc). For the most part, I expect that people writing algorithmic C modules will start-off by writing a pure python version anyway, so this shouldn't be a big change to their development process.
P.S. We also need a PEP 8 entry or somesuch giving specific advice about rich comparisons (i.e. never just supply one ordering method, always implement all six); otherwise, rich comparisons will be a never ending source of headaches.
Fine by me, but I will let you handle that one.
Done! Raymond
On Apr 6, 2011, at 1:22 PM, Brett Cannon wrote:
How about the test suite needs to have 100% test coverage (or as close as possible) on the pure Python version? That will guarantee that the C code which passes that level of test detail is as semantically equivalent as possible. It also allows the other VMs to write their own acceleration code without falling into the same trap as CPython can.
One other thought: we should probably make a specific exception for pure python code using generators. It is common for generators to defer argument checking until the next() method is called while the C equivalent makes the check immediately upon instantiation (i.e. map(chr, 3) raises TypeError immediately in C but a pure python generator won't raise until the generator is actually run). Raymond
On Thu, Apr 7, 2011 at 6:22 AM, Brett Cannon
How about the test suite needs to have 100% test coverage (or as close as possible) on the pure Python version? That will guarantee that the C code which passes that level of test detail is as semantically equivalent as possible. It also allows the other VMs to write their own acceleration code without falling into the same trap as CPython can.
Independent of coverage numbers, C acceleration code should really be tested with 3 kinds of arguments: - builtin types - subclasses of builtin types - duck types Those are (often) 2 or 3 different code paths in accelerated C code, but will usually be a single path in the Python code. (e.g. I'd be willing to bet that it is possible to get the Python version of heapq to 100% coverage without testing the second two cases, since the Python code doesn't special-case list in any way) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Here is the next draft of the PEP. I changed the semantics requirement to
state that 100% branch coverage is required for any Python code that is
being replaced by accelerated C code instead of the broad "must be
semantically equivalent". Also tweaked wording here and there to make
certain things more obvious.
----------------------------------
PEP: 399
Title: Pure Python/C Accelerator Module Compatibility Requirements
Version: $Revision: 88219 $
Last-Modified: $Date: 2011-01-27 13:47:00 -0800 (Thu, 27 Jan 2011) $
Author: Brett Cannon
In the grand python-dev tradition of "silence means acceptance", I consider
this PEP finalized and implicitly accepted.
On Tue, Apr 12, 2011 at 15:07, Brett Cannon
Here is the next draft of the PEP. I changed the semantics requirement to state that 100% branch coverage is required for any Python code that is being replaced by accelerated C code instead of the broad "must be semantically equivalent". Also tweaked wording here and there to make certain things more obvious.
----------------------------------
PEP: 399 Title: Pure Python/C Accelerator Module Compatibility Requirements
Version: $Revision: 88219 $ Last-Modified: $Date: 2011-01-27 13:47:00 -0800 (Thu, 27 Jan 2011) $ Author: Brett Cannon
Status: Draft Type: Informational Content-Type: text/x-rst Created: 04-Apr-2011 Python-Version: 3.3 Post-History: 04-Apr-2011, 12-Apr-2011 Abstract ========
The Python standard library under CPython contains various instances of modules implemented in both pure Python and C (either entirely or partially). This PEP requires that in these instances that the C code *must* pass the test suite used for the pure Python code so as to act as much as a drop-in replacement as possible (C- and VM-specific tests are exempt). It is also required that new
C-based modules lacking a pure Python equivalent implementation get special permissions to be added to the standard library.
Rationale =========
Python has grown beyond the CPython virtual machine (VM). IronPython_, Jython_, and PyPy_ all currently being viable alternatives to the CPython VM. This VM ecosystem that has sprung up around the Python programming language has led to Python being used in many different areas where CPython cannot be used, e.g., Jython allowing Python to be used in Java applications.
A problem all of the VMs other than CPython face is handling modules from the standard library that are implemented (to some extent) in C.
Since they do not typically support the entire `C API of Python`_ they are unable to use the code used to create the module. Often times this leads these other VMs to either re-implement the modules in pure Python or in the programming language used to implement the VM (e.g., in C# for IronPython). This duplication of effort between CPython, PyPy, Jython, and IronPython is extremely unfortunate as implementing a module *at least* in pure Python would help mitigate this duplicate effort.
The purpose of this PEP is to minimize this duplicate effort by mandating that all new modules added to Python's standard library *must* have a pure Python implementation _unless_ special dispensation is given. This makes sure that a module in the stdlib is available to all VMs and not just to CPython (pre-existing modules that do not meet this requirement are exempt, although there is nothing preventing someone from adding in a pure Python implementation retroactively).
Re-implementing parts (or all) of a module in C (in the case of CPython) is still allowed for performance reasons, but any such accelerated code must pass the same test suite (sans VM- or C-specific tests) to verify semantics and prevent divergence. To accomplish this, the test suite for the module must have 100% branch coverage of the pure Python implementation before the acceleration code may be added.
This is to prevent users from accidentally relying on semantics that are specific to the C code and are not reflected in the pure Python implementation that other VMs rely upon. For example, in CPython 3.2.0, ``heapq.heappop()`` does an explicit type check in its accelerated C code while the Python code uses duck typing::
from test.support import import_fresh_module
c_heapq = import_fresh_module('heapq', fresh=['_heapq']) py_heapq = import_fresh_module('heapq', blocked=['_heapq'])
class Spam: """Tester class which defines no other magic methods but __len__().""" def __len__(self): return 0
try: c_heapq.heappop(Spam()) except TypeError: # Explicit type check failure: "heap argument must be a list"
pass
try: py_heapq.heappop(Spam()) except AttributeError: # Duck typing failure: "'Foo' object has no attribute 'pop'"
pass
This kind of divergence is a problem for users as they unwittingly write code that is CPython-specific. This is also an issue for other VM teams as they have to deal with bug reports from users thinking that they incorrectly implemented the module when in fact it was caused by an untested case.
Details =======
Starting in Python 3.3, any modules added to the standard library must have a pure Python implementation. This rule can only be ignored if the Python development team grants a special exemption for the module. Typically the exemption will be granted only when a module wraps a
specific C-based library (e.g., sqlite3_). In granting an exemption it will be recognized that the module will be considered exclusive to
CPython and not part of Python's standard library that other VMs are expected to support. Usage of ``ctypes`` to provide an API for a C library will continue to be frowned upon as ``ctypes`` lacks compiler guarantees that C code typically relies upon to prevent certain errors from occurring (e.g., API changes).
Even though a pure Python implementation is mandated by this PEP, it does not preclude the use of a companion acceleration module. If an acceleration module is provided it is to be named the same as the module it is accelerating with an underscore attached as a prefix, e.g., ``_warnings`` for ``warnings``. The common pattern to access the accelerated code from the pure Python implementation is to import it with an ``import *``, e.g., ``from _warnings import *``. This is typically done at the end of the module to allow it to overwrite specific Python objects with their accelerated equivalents. This kind of import can also be done before the end of the module when needed, e.g., an accelerated base class is provided but is then subclassed by Python code. This PEP does not mandate that pre-existing modules in the stdlib that lack a pure Python equivalent gain such a module. But if people do volunteer to provide and maintain a pure Python equivalent (e.g., the PyPy team volunteering their pure Python implementation of the ``csv`` module and maintaining it) then such code will be accepted.
This requirement does not apply to modules already existing as only C code in the standard library. It is acceptable to retroactively add a pure Python implementation of a module implemented entirely in C, but in those instances the C version is considered the reference implementation in terms of expected semantics.
Any new accelerated code must act as a drop-in replacement as close to the pure Python implementation as reasonable. Technical details of the VM providing the accelerated code are allowed to differ as necessary, e.g., a class being a ``type`` when implemented in C. To verify that the Python and equivalent C code operate as similarly as possible, both code bases must be tested using the same tests which apply to the pure Python code (tests specific to the C code or any VM do not follow under this requirement). To make sure that the test suite is thorough enough to cover all relevant semantics, the tests must have 100% branch coverage for the Python code being replaced by C code. This will make sure that the new acceleration code will operate as much like a drop-in replacement for the Python code is as possible. Testing should still be done for issues that come up when working with C code even if it is not explicitly required to meet the coverage requirement, e.g., Tests should be aware that C code typically has special paths for things such as built-in types, subclasses of built-in types, etc.
Acting as a drop-in replacement also dictates that no public API be
provided in accelerated code that does not exist in the pure Python code. Without this requirement people could accidentally come to rely on a detail in the accelerated code which is not made available to
other VMs that use the pure Python implementation. To help verify that the contract of semantic equivalence is being met, a module must be tested both with and without its accelerated code as thoroughly as possible.
As an example, to write tests which exercise both the pure Python and C accelerated versions of a module, a basic idiom can be followed::
import collections.abc from test.support import import_fresh_module, run_unittest import unittest
c_heapq = import_fresh_module('heapq', fresh=['_heapq']) py_heapq = import_fresh_module('heapq', blocked=['_heapq'])
class ExampleTest(unittest.TestCase):
def test_heappop_exc_for_non_MutableSequence(self): # Raise TypeError when heap is not a # collections.abc.MutableSequence. class Spam: """Test class lacking many ABC-required methods (e.g., pop()).""" def __len__(self): return 0
heap = Spam() self.assertFalse(isinstance(heap, collections.abc.MutableSequence)) with self.assertRaises(TypeError): self.heapq.heappop(heap)
class AcceleratedExampleTest(ExampleTest):
"""Test using the accelerated code."""
heapq = c_heapq
class PyExampleTest(ExampleTest):
"""Test with just the pure Python code."""
heapq = py_heapq
def test_main(): run_unittest(AcceleratedExampleTest, PyExampleTest)
if __name__ == '__main__': test_main()
If this test were to provide 100% branch coverage for ``heapq.heappop()`` in the pure Python implementation then the accelerated C code would be allowed to be added to CPython's standard library. If it did not, then the test suite would need to be updated until 100% branch coverage was provided before the accelerated C code could be added.
Copyright =========
This document has been placed in the public domain.
.. _IronPython: http://ironpython.net/ .. _Jython: http://www.jython.org/ .. _PyPy: http://pypy.org/ .. _C API of Python: http://docs.python.org/py3k/c-api/index.html .. _sqlite3: http://docs.python.org/py3k/library/sqlite3.html
Brett Cannon
In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I did not really see an answer to these concerns: http://mail.python.org/pipermail/python-dev/2011-April/110672.html http://mail.python.org/pipermail/python-dev/2011-April/110675.html Stefan Krah
On Sat, Apr 16, 2011 at 14:23, Stefan Krah
Brett Cannon
wrote: In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I did not really see an answer to these concerns:
http://mail.python.org/pipermail/python-dev/2011-April/110672.html
Antoine does seem sold on the 100% branch coverage requirement and views it as pointless. I disagree. =) As for the exception Stefan is saying may be granted, that is not in the PEP so I consider it unimportant. If we really feel the desire to grant an exception we can (since we can break any of our own rules that we collectively choose to), but I'm assuming we won't.
http://mail.python.org/pipermail/python-dev/2011-April/110675.html
Raymond thinks that have a testing requirement conflates having implementations match vs. APIs. Well, as we all know, the stdlib ends up having its implementation details relied upon constantly by people whether they mean to or not, so making sure that this is properly tested helps deal with this known reality. This is a damned-if-you-do-damned-if-you-don't situation. The first draft of this PEP said to be "semantically equivalent w/ divergence where technically required", but I got pushback from being too wishy-washy w/ lack of concrete details. So I introduce a concrete metric that some are accusing of being inaccurate for the goals of the PEP. I'm screwed or I'm screwed. =) So I am choosing to go with the one that has a side benefit of also increasing test coverage. Now if people would actually support simply not accepting any more C modules into the Python stdlib (this does not apply to CPython's stdlib), then I'm all for that. I only went with the "accelerator modules are okay" route to help get acceptance for the PEP. But if people are willing to go down a more stringent route and say that any module which uses new C code is considered CPython-specific and thus any acceptance of such modules will be damn hard to accomplish as it will marginalize the value of the code, that's fine by me.
On Sat, 16 Apr 2011 14:45:52 -0700
Brett Cannon
On Sat, Apr 16, 2011 at 14:23, Stefan Krah
wrote: Brett Cannon
wrote: In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I did not really see an answer to these concerns:
http://mail.python.org/pipermail/python-dev/2011-April/110672.html
Antoine does seem sold on the 100% branch coverage requirement and views it as pointless.
Not really. I think this is an unreasonable requirement because of the reasons I've stated in my previous messages :) If you rephrase it to remove the "100% coverage" requirement and replace it by something like "comprehensive coverage", then I'm ok.
Now if people would actually support simply not accepting any more C modules into the Python stdlib (this does not apply to CPython's stdlib), then I'm all for that.
Hmm, what's the difference between "the Python stdlib" and "CPython's stdlib"? I'm also not sure how you would enforce that anyway. If it means using ctypes to interface with system C libraries, I'm -10 on it :) Regards Antoine.
On Sat, Apr 16, 2011 at 3:54 PM, Antoine Pitrou
Hmm, what's the difference between "the Python stdlib" and "CPython's stdlib"?
I'm also not sure how you would enforce that anyway. If it means using ctypes to interface with system C libraries, I'm -10 on it :)
Sounds like Brett is talking about the distinction apparently discussed at the language summit ("Standalone Standard Library"): http://blog.python.org/2011/03/2011-language-summit-report.html -eric http://blog.python.org/2011/03/2011-language-summit-report.html
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail....
On Apr 16, 2011, at 2:45 PM, Brett Cannon wrote:
On Sat, Apr 16, 2011 at 14:23, Stefan Krah
wrote: Brett Cannon wrote: In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I haven't seen any responses that said, yes this is a well thought-out proposal that will actually benefit any of the various implementations. Almost none of the concerns that have been raised has been addressed. Does the PEP only apply to purely algorithmic modules such as heapq or does it apply to anything written in C (like an xz compressor or for example)? Does testing every branch in a given implementation now guarantee every implementation detail or do we only promise the published API (historically, we've *always* done the latter)? Is there going to be any guidance on the commonly encountered semantic differences between C modules and their Python counterparts (thread-safety, argument handling, tracebacks, all possible exceptions, monkey-patchable pure python classes versus hard-wired C types etc)? The PEP seems to be predicated on a notion that anything written in C is bad and that all testing is good. AFAICT, it doesn't provide any practical advice to someone pursuing a non-trivial project (such as decimal or threading). The PEP mostly seems to be about discouraging any further work in C. If that's the case, it should just come out and say it rather than tangentially introducing ambiguous testing requirements that don't make a lot of sense. The PEP also makes some unsupported claims about saving labor. My understanding is the IronPython and Jython tend to re-implement modules using native constructs. Even with PyPy, the usual pure python idioms aren't necessarily what is best for PyPy, so I expect some rewriting there also. It seems the lion's share of the work in making other implementations has to do with interpreter details and whatnot -- I would be surprised if the likes of bisect or heapq took even one-tenth of one percent of the total development time for any of the other implementations.
I did not really see an answer to these concerns:
http://mail.python.org/pipermail/python-dev/2011-April/110672.html
Antoine does seem sold on the 100% branch coverage requirement and views it as pointless. I disagree. =)
As for the exception Stefan is saying may be granted, that is not in the PEP so I consider it unimportant. If we really feel the desire to grant an exception we can (since we can break any of our own rules that we collectively choose to), but I'm assuming we won't.
http://mail.python.org/pipermail/python-dev/2011-April/110675.html
Raymond thinks that have a testing requirement conflates having implementations match vs. APIs.
That is not an accurate restatement of my post.
Well, as we all know, the stdlib ends up having its implementation details relied upon constantly by people whether they mean to or not, so making sure that this is properly tested helps deal with this known reality.
If you're saying that all implementation details (including internal branching logic) are now guaranteed behaviors, then I think this PEP has completely lost its way. I don't know of any implementors asking for this.
This is a damned-if-you-do-damned-if-you-don't situation. The first draft of this PEP said to be "semantically equivalent w/ divergence where technically required", but I got pushback from being too wishy-washy w/ lack of concrete details. So I introduce a concrete metric that some are accusing of being inaccurate for the goals of the PEP. I'm screwed or I'm screwed. =) So I am choosing to go with the one that has a side benefit of also increasing test coverage.
Maybe that is just an indication that the proposal isn't mature yet. To me, it doesn't seem well thought out and isn't realistic.
Now if people would actually support simply not accepting any more C modules into the Python stdlib (this does not apply to CPython's stdlib), then I'm all for that. I only went with the "accelerator modules are okay" route to help get acceptance for the PEP. But if people are willing to go down a more stringent route and say that any module which uses new C code is considered CPython-specific and thus any acceptance of such modules will be damn hard to accomplish as it will marginalize the value of the code, that's fine by me.
Is that what people want? For example, do we want to accept a C version of decimal? Without it, the decimal module is unusable for people with high volumes of data. Do we want things like an xz compressor to be written in pure python and only in Python? I don't think this benefits our users. I'm not really clear what it is you're trying to get at. For PyPy, IronPython, and Jython to succeed, does the CPython project need to come to a halt? I don't think many people here really believe that to be the case. Raymond
On Sat, 16 Apr 2011 19:19:32 -0700, Raymond Hettinger
On Apr 16, 2011, at 2:45 PM, Brett Cannon wrote:
On Sat, Apr 16, 2011 at 14:23, Stefan Krah
wrote: Brett Cannon wrote: In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I haven't seen any responses that said, yes this is a well thought-out proposal that will actually benefit any of the various implementations.
In that case it may well be that the silence is because the other implementations think the PEP is OK. They certainly voted in favor of the broad outline of it at the language summit. Perhaps representatives will speak up, or perhaps Brett will need to poll them proactively.
Almost none of the concerns that have been raised has been addressed. Does the PEP only apply to purely algorithmic modules such as heapq or does it apply to anything written in C (like an xz compressor or for example)? Does testing
Anything (new) written in C that can be also written in Python (and usually is first, to at least prototype it). If an XZ compressor is a wrapper around an external library, that would be a different story.
every branch in a given implementation now guarantee every implementation detail or do we only promise the published API (historically, we've *always* done the latter)?
As Brett said, people do come to depend on the details of the implementation. But IMO the PEP should be clarified to say that the tests we are talking about should be tests *of the published API*. That is, blackbox tests, not whitebox tests.
Is there going to be any guidance on the commonly encountered semantic differences between C modules and their Python counterparts (thread-safety, argument handling, tracebacks, all possible exceptions, monkey-patchable pure python classes versus hard-wired C types etc)?
Presumably we will need to develop such guidance.
The PEP seems to be predicated on a notion that anything written in C is bad and that all testing is good. AFAICT, it doesn't provide any practical advice to someone pursuing a non-trivial project (such as decimal or threading). The PEP
Decimal already has a Python implementation with a very comprehensive test suite (no, I don't know if it has 100% coverage). My understanding is that Stefan's code passes the Python test suite. So I'm not sure what the issue is, there. Stefan? Threading is an existing module, so it doesn't seem to me that the PEP particularly applies to it.
The PEP also makes some unsupported claims about saving labor. My understanding is the IronPython and Jython tend to re-implement modules using native constructs. Even with PyPy, the usual pure python idioms aren't necessarily what is best for PyPy, so I expect some rewriting there also. It seems the lion's share of the work in making other implementations has to do with interpreter details and whatnot -- I would be surprised if the likes of bisect or heapq took even one-tenth of one percent of the total development time for any of the other implementations.
That's an orthogonal issue. Having *working* Python implementations of as much of the stdlib as practical makes it easier to spin up a new Python language implementation: once you get the language working, you've got all the bits of the stdlib that have Python versions. *Then* you can implement accelerators (and if you are CPython, you do that in C...)
If you're saying that all implementation details (including internal branching logic) are now guaranteed behaviors, then I think this PEP has completely lost its way. I don't know of any implementors asking for this.
I don't think the PEP is asking this either (or if it is I agree it shouldn't be). The way to get full branch coverage (and yes Exarkun is right, this is about individual branches; see coverage.py --branch) is to provide test cases that exercise the published API such that those branches are taken. If you can't do that, then what is that branch of the Python code for? If you can do that, how is the test case testing an implementation detail? It is testing the behavior of the API. The 100% branch coverage metric is just a measurable way to improve test coverage. As I've said before, it does not guarantee that all important (API) test cases are covered, but it is one way to improve that coverage that has a measure attached, and measures are helpful. I personally have no problem with the 100% coverage being made a recommendation in the PEP rather than a requirement. It sounds like that might be acceptable to Antoine. Actually, I would also be fine with saying "comprehensive" instead, with a note that 100% branch coverage is a good way to head toward that goal, since a comprehensive test suite should contain more tests than the minimum set needed to get to 100% branch coverage. A relevant story: to achieve 100% branch coverage in one of the email modules I had to resort to one test that used the API in a way for which the behavior of the API is *not* documented, and one white box test. I marked both of these as to their nature, and would not expect a theoretical email C accelerator to pass either of those tests. For the one that requires a white box test, that code path will probably eventually go away; for the undocumented API use, it will get documented and the test adjusted accordingly...and writing that test revealed the need for said documentation. Perhaps we need a @python_implementation_detail skip decorator?
Is that what people want? For example, do we want to accept a C version of decimal? Without it, the decimal module is unusable for people with high volumes of data. Do we want things like an xz compressor to be written in pure python and only in Python? I don't think this benefits our users.
I'm not really clear what it is you're trying to get at. For PyPy, IronPython, and Jython to succeed, does the CPython project need to come to a halt? I don't think many people here really believe that to be the case.
No, I don't think any of these things are aims. But if/once the Python stdlib is a separate repo, then in *that* repo you'd only have pure Python modules, with the CPython-specific C accelerators living in the CPython repo. (Yes, there are still quite a few details to work out about how this would work! We aren't ready to do it yet; this PEP is just trying to pave the way.) -- R. David Murray http://www.bitdance.com
On 4/17/2011 1:32 AM, R. David Murray wrote:
As Brett said, people do come to depend on the details of the implementation. But IMO the PEP should be clarified to say that the tests we are talking about should be tests *of the published API*. That is, blackbox tests, not whitebox tests.
I think 100% *branch* coverage is barking up the wrong tree. Better to say comprehensive *api* coverage. Bugs on the tracker generally come from not having that. (I am not saying 'all' to allow for bugs that happen from weird interactions or corner cases in spite of what could reasonably be called comprehensive._
I don't think the PEP is asking this either (or if it is I agree it shouldn't be). The way to get full branch coverage (and yes Exarkun is right, this is about individual branches; see coverage.py --branch) is to provide test cases that exercise the published API such that those branches are taken. If you can't do that, then what is that branch of the Python code for? If you can do that, how is the test case testing an implementation detail? It is testing the behavior of the API.
Right. -- Terry Jan Reedy
In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I haven't seen any responses that said, yes this is a well thought-out proposal that will actually benefit any of the various implementations.
In that case it may well be that the silence is because the other implementations think the PEP is OK. They certainly voted in favor of the broad outline of it at the language summit.
Sounds like it was implicitly accepted even before it was written or any of the details were discussed. The big picture of "let's do something to make life easier for other implementations" is a worthy goal. What that something should be is still a bit ambiguous.
every branch in a given implementation now guarantee every implementation detail or do we only promise the published API (historically, we've *always* done the latter)?
As Brett said, people do come to depend on the details of the implementation. But IMO the PEP should be clarified to say that the tests we are talking about should be tests *of the published API*. That is, blackbox tests, not whitebox tests.
+1 That's an excellent suggestion. Without that change, it seems like the PEP is overreaching.
Is there going to be any guidance on the commonly encountered semantic differences between C modules and their Python counterparts (thread-safety, argument handling, tracebacks, all possible exceptions, monkey-patchable pure python classes versus hard-wired C types etc)?
Presumably we will need to develop such guidance.
+1 That would be very helpful. Right now, the PEP doesn't address any of the commonly encountered differences.
I personally have no problem with the 100% coverage being made a recommendation in the PEP rather than a requirement. It sounds like that might be acceptable to Antoine. Actually, I would also be fine with saying "comprehensive" instead, with a note that 100% branch coverage is a good way to head toward that goal, since a comprehensive test suite should contain more tests than the minimum set needed to get to 100% branch coverage.
+1 better test coverage is always a good thing (IMO). Raymond
On Sun, 17 Apr 2011 00:30:22 -0700, Raymond Hettinger
In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I haven't seen any responses that said, yes this is a well thought-out proposal that will actually benefit any of the various implementations.
In that case it may well be that the silence is because the other implementations think the PEP is OK. They certainly voted in favor of the broad outline of it at the language summit.
Sounds like it was implicitly accepted even before it was written or any of the details were discussed.
No, just the principle that something along these lines would be good. Any final decision of course requires the actual PEP to look at, which was also acknowledged at the summit. My point was that lack of comment from the other implementations *might* indicate they liked how the PEP turned out. But it might also mean they aren't paying attention, which would be bad...
The big picture of "let's do something to make life easier for other implementations" is a worthy goal. What that something should be is still a bit ambiguous.
As I said in another email, I think the something that should be done is to put CPython on equal footing implementation-pain-wise and lets-make-this-work-wise with the other implementations. The end result will be better test coverage and clearer APIs in the stdlib. -- R. David Murray http://www.bitdance.com
I just want to say upfront that my personal life has just gotten very hectic as of late (green card stuff for my wife who is Canadian) and probably will not let up until June. So if I go a while without replying to points being made for quite a while, I apologize. Luckily there seem to be others here who understand the direction I am coming from so there is no need to stop talking while I am pre-occupied with the real world. On Sun, Apr 17, 2011 at 00:30, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I haven't seen any responses that said, yes this is a well thought-out proposal that will actually benefit any of the various implementations.
In that case it may well be that the silence is because the other implementations think the PEP is OK. They certainly voted in favor of the broad outline of it at the language summit.
Sounds like it was implicitly accepted even before it was written or any of the details were discussed.
Actually I directly emailed the relevant people from the other VMs to make sure they were happy with what I was aiming for before I approached python-dev with the PEP. So IronPython, Jython, and PyPy lead developers have all told me that they want something along the lines of this PEP to happen.
The big picture of "let's do something to make life easier for other implementations" is a worthy goal. What that something should be is still a bit ambiguous.
every branch in a given implementation now guarantee every implementation detail or do we only promise the published API (historically, we've *always* done the latter)?
As Brett said, people do come to depend on the details of the implementation. But IMO the PEP should be clarified to say that the tests we are talking about should be tests *of the published API*. That is, blackbox tests, not whitebox tests.
+1 That's an excellent suggestion. Without that change, it seems like the PEP is overreaching.
I'm okay with going with this line of through, including R. David's "100% branch coverage is but one way to achieve extensive testing of the published API".
Is there going to be any guidance on the commonly encountered semantic differences between C modules and their Python counterparts (thread-safety, argument handling, tracebacks, all possible exceptions, monkey-patchable pure python classes versus hard-wired C types etc)?
Presumably we will need to develop such guidance.
+1 That would be very helpful. Right now, the PEP doesn't address any of the commonly encountered differences.
If people are willing to help me (i.e., go ahead and edit the PEP) with this then I am okay with adding some common issues (but I don't expect it to be exhaustive).
I personally have no problem with the 100% coverage being made a recommendation in the PEP rather than a requirement. It sounds like that might be acceptable to Antoine. Actually, I would also be fine with saying "comprehensive" instead, with a note that 100% branch coverage is a good way to head toward that goal, since a comprehensive test suite should contain more tests than the minimum set needed to get to 100% branch coverage.
+1 better test coverage is always a good thing (IMO).
Raymond
R. David Murray
The PEP seems to be predicated on a notion that anything written in C is bad and that all testing is good. AFAICT, it doesn't provide any practical advice to someone pursuing a non-trivial project (such as decimal or threading). The PEP
Decimal already has a Python implementation with a very comprehensive test suite (no, I don't know if it has 100% coverage). My understanding is that Stefan's code passes the Python test suite. So I'm not sure what the issue is, there. Stefan?
test_decimal.py does not have 100% coverage yet. cdecimal passes the tests, but several decimal.py functions would have to perform type checking to get identical exception behavior. The current version of the joint unit tests is here: http://hg.python.org/features/cdecimal/file/b00f8fa70126/Lib/test/decimal_te... cdecimal specific behavior is guarded by HAVE_CDECIMAL, so it is possible to grep for the differences. As an aside, test_decimal.py constitutes at most 1% of the total tests. The important tests (mathematical correctness and conformance to the specification) are in two separate test suites, one of which runs tests against decimal.py and the other against decNumber. These tests can easily take a week to run, so they can't be part of the regression tests. Stefan Krah
On Sun, 17 Apr 2011 01:32:15 -0400
"R. David Murray"
I personally have no problem with the 100% coverage being made a recommendation in the PEP rather than a requirement. It sounds like that might be acceptable to Antoine. Actually, I would also be fine with saying "comprehensive" instead, with a note that 100% branch coverage is a good way to head toward that goal, since a comprehensive test suite should contain more tests than the minimum set needed to get to 100% branch coverage.
If that's a recommendation then it's ok, although I would still prefer we don't advocate such metrics. It's too easy for some people to get obsessed about numeric measurements of "quality", leading them to dubious workarounds and tricks (e.g. when using style-checking tools à la pylint). Regards Antoine.
On 17 April 2011 06:32, R. David Murray
I don't think the PEP is asking this either (or if it is I agree it shouldn't be). The way to get full branch coverage (and yes Exarkun is right, this is about individual branches; see coverage.py --branch)
One thing I'm definitely uncomfortable about is expressing the requirement in a way that depends on a non-stdlib module (coverage.py). Should coverage.py be added to the stdlib if we're going to take test coverage as a measure? Hmm, maybe it goes without saying, but does coverage.py work on Jython, IronPython, etc? (A quick google search actually indicates that there might be some issues still to be resolved...) Paul.
On Mon, 18 Apr 2011 18:34:06 +0200, =?UTF-8?Q?=C3=89ric_Araujo?=
Perhaps we need a @python_implementation_detail skip decorator? That’s called test.support.cpython_only (see also test.support.check_impl_detail). You’re welcome.
Nope. That's not what I was talking about. I was talking about marking a test as something that we require only the *python* implementation of a module to pass (presumably because it tests an internal implementation detail). Thus a C accelerator would not be expected to pass that test, nor would a C# accelerator, but pypy or any platform without an accelerator (that is, anything *using* the python code) would be expected to pass it. I would hope that such tests would be vanishingly rare (that is, that all needed tests can be expressed as black box tests). -- R. David Murray http://www.bitdance.com
On Sun, Apr 17, 2011 at 4:19 AM, Raymond Hettinger
On Apr 16, 2011, at 2:45 PM, Brett Cannon wrote:
On Sat, Apr 16, 2011 at 14:23, Stefan Krah
wrote: Brett Cannon
wrote: In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
I haven't seen any responses that said, yes this is a well thought-out proposal that will actually benefit any of the various implementations. Almost none of the concerns that have been raised has been addressed. Does the PEP only apply to purely algorithmic modules such as heapq or does it apply to anything written in C (like an xz compressor or for example)?
My understanding is it does apply only to stuff that does not wrap an external library.
Does testing every branch in a given implementation now guarantee every implementation detail or do we only promise the published API (historically, we've *always* done the latter)? Is there going to be any guidance on the commonly encountered semantic differences between C modules and their Python counterparts (thread-safety, argument handling, tracebacks, all possible exceptions, monkey-patchable pure python classes versus hard-wired C types etc)? The PEP seems to be predicated on a notion that anything written in C is bad and that all testing is good.
Sounds about right
AFAICT, it doesn't provide any practical advice to someone pursuing a non-trivial project (such as decimal or threading). The PEP mostly seems to be about discouraging any further work in C. If that's the case, it should just come out and say it rather than tangentially introducing ambiguous testing requirements that don't make a lot of sense. The PEP also makes some unsupported claims about saving labor. My understanding is the IronPython and Jython tend to re-implement modules using native constructs. Even with PyPy, the usual pure python idioms aren't necessarily what is best for PyPy, so I expect some rewriting there also.
We try very hard to optimize for usual python idioms. They're very often much better than specific cpython hacks. Unless you mean things like rebiding a global into default a "pythonic idiom". We had to rewrite places in standard library which are precisely not very pythonic.
It seems the lion's share of the work in making other implementations has to do with interpreter details and whatnot -- I would be surprised if the likes of bisect or heapq took even one-tenth of one percent of the total development time for any of the other implementations.
You're wrong. We didn't even write _heapq and _bisect. That's actually a lot of work and PyPy's team is quite small *and* it has to do all the other stuff as well. heapq and bisect were never a problem (except one case in twisted), but other stuff where C version diverged from Python version were a problem. Hell, we even wrote cPickle which wraps pickle and provides correct interface! This is kind of things we would rather not spend time on (and yes, it is time consuming).
I did not really see an answer to these concerns:
http://mail.python.org/pipermail/python-dev/2011-April/110672.html
Antoine does seem sold on the 100% branch coverage requirement and views it as pointless. I disagree. =)
As for the exception Stefan is saying may be granted, that is not in the PEP so I consider it unimportant. If we really feel the desire to grant an exception we can (since we can break any of our own rules that we collectively choose to), but I'm assuming we won't.
http://mail.python.org/pipermail/python-dev/2011-April/110675.html
Raymond thinks that have a testing requirement conflates having implementations match vs. APIs.
That is not an accurate restatement of my post.
Well, as we all know, the stdlib ends up having its implementation details relied upon constantly by people whether they mean to or not, so making sure that this is properly tested helps deal with this known reality.
If you're saying that all implementation details (including internal branching logic) are now guaranteed behaviors, then I think this PEP has completely lost its way. I don't know of any implementors asking for this.
This is a damned-if-you-do-damned-if-you-don't situation. The first draft of this PEP said to be "semantically equivalent w/ divergence where technically required", but I got pushback from being too wishy-washy w/ lack of concrete details. So I introduce a concrete metric that some are accusing of being inaccurate for the goals of the PEP. I'm screwed or I'm screwed. =) So I am choosing to go with the one that has a side benefit of also increasing test coverage.
Maybe that is just an indication that the proposal isn't mature yet. To me, it doesn't seem well thought out and isn't realistic.
Now if people would actually support simply not accepting any more C modules into the Python stdlib (this does not apply to CPython's stdlib), then I'm all for that.
I only went with the "accelerator modules are okay" route to help get acceptance for the PEP. But if people are willing to go down a more stringent route and say that any module which uses new C code is considered CPython-specific and thus any acceptance of such modules will be damn hard to accomplish as it will marginalize the value of the code, that's fine by me.
Is that what people want? For example, do we want to accept a C version of decimal? Without it, the decimal module is unusable for people with high volumes of data. Do we want things like an xz compressor to be written in pure python and only in Python? I don't think this benefits our users. I'm not really clear what it is you're trying to get at. For PyPy, IronPython, and Jython to succeed, does the CPython project need to come to a halt? I don't think many people here really believe that to be the case.
Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On 18 April 2011 08:05, Maciej Fijalkowski
On Sun, Apr 17, 2011 at 4:19 AM, Raymond Hettinger
wrote:
Almost none of the concerns that have been raised has been addressed. Does the PEP only apply to purely algorithmic modules such as heapq or does it apply to anything written in C (like an xz compressor or for example)?
My understanding is it does apply only to stuff that does not wrap an external library.
My understanding is that this is most people's understanding, so it should be explicitly documented in the PEP. It would also be worth asking: are there any other reasons for using C code beyond wrapping external libraries and accelerating code that could equally be written in Python? I can't think of any, myself, but OTOH I wonder if the *degree* of acceleration is also relevant - some things (compression algorithms, for example) just can't realistically be coded in Python (CPython, at least).
The PEP seems to be predicated on a notion that anything written in C is bad and that all testing is good.
Sounds about right
I disagree. To me, a Python without libraries such as os, zlib, zipfile, threading, etc wouldn't be much use (except in specialised circumstances). OK, that means that alternative implementations need to do extra work to implement equivalents in their own low-level language, but so be it (sorry!) This PEP has a flavour to me of the old "100% pure Java" ideal, where Java coders expected everything to be reimplemented in Java, avoiding any native code. I didn't like the idea then, and I don't have much more love for it now in Python. (OK, I know this is an exaggeration of the position the PEP is taking, but without more clarity in the PEP's language, I honestly don't know how much of an exaggeration). Maybe the PEP could go through the various C libraries in the stdlib at the moment, and discuss how the PEP would address them? It would be useful to see how much of an impact the PEP would have had if it had been Python policy from the start... Paul.
On Mon, 18 Apr 2011 09:36:20 +0100 Paul Moore
On 18 April 2011 08:05, Maciej Fijalkowski
wrote: On Sun, Apr 17, 2011 at 4:19 AM, Raymond Hettinger
wrote: Almost none of the concerns that have been raised has been addressed. Does the PEP only apply to purely algorithmic modules such as heapq or does it apply to anything written in C (like an xz compressor or for example)?
My understanding is it does apply only to stuff that does not wrap an external library.
My understanding is that this is most people's understanding, so it should be explicitly documented in the PEP.
It would also be worth asking: are there any other reasons for using C code beyond wrapping external libraries and accelerating code that could equally be written in Python?
faulthandler is an example. Very low-level tinkering with threads, signal handlers and possibly corrupt memory simply can't be done in Python. Regards Antoine.
On Mon, 18 Apr 2011 09:36:20 +0100, Paul Moore
On 18 April 2011 08:05, Maciej Fijalkowski
wrote: On Sun, Apr 17, 2011 at 4:19 AM, Raymond Hettinger
wrote: The PEP seems to be predicated on a notion that anything written in C is bad and that all testing is good.
Sounds about right
I disagree. To me, a Python without libraries such as os, zlib, zipfile, threading, etc wouldn't be much use (except in specialised circumstances). OK, that means that alternative implementations need to do extra work to implement equivalents in their own low-level language, but so be it (sorry!)
I think Maciej left out an "only" in that sentence. If you say "only C", then the sentence makes sense, even when applied to modules that *can* only be written in C (for CPython). That is, not having a Python version is bad. Necessary in many cases (or not worth the cost, for external library wrappers), but wouldn't it be nicer if it wasn't necessary?
This PEP has a flavour to me of the old "100% pure Java" ideal, where Java coders expected everything to be reimplemented in Java, avoiding any native code. I didn't like the idea then, and I don't have much more love for it now in Python. (OK, I know this is an exaggeration of the position the PEP is taking, but without more clarity in the PEP's language, I honestly don't know how much of an exaggeration).
The Pythonic ideal contains quite a bit of pragmatism, so yes, that is an exaggeration of the goals of the PEP, certainly. (Although pypy may do it anyway, for pragmatic reasons :)
Maybe the PEP could go through the various C libraries in the stdlib at the moment, and discuss how the PEP would address them? It would be useful to see how much of an impact the PEP would have had if it had been Python policy from the start...
That might indeed be a useful exercise, especially since other implementations (or even perhaps CPython developers) may want to contribute Python-only versions and/or tests for things that would have been affected by the PEP. I don't have time to do it right now, but if I can pry any time loose I'll have it near the top of my list. -- R. David Murray http://www.bitdance.com
R. David Murray, 18.04.2011 14:30:
On Mon, 18 Apr 2011 09:36:20 +0100, Paul Moore wrote:
On 18 April 2011 08:05, Maciej Fijalkowski wrote:
On Sun, Apr 17, 2011 at 4:19 AM, Raymond Hettinger wrote:
The PEP seems to be predicated on a notion that anything written in C is bad and that all testing is good.
Sounds about right
I disagree. To me, a Python without libraries such as os, zlib, zipfile, threading, etc wouldn't be much use (except in specialised circumstances). OK, that means that alternative implementations need to do extra work to implement equivalents in their own low-level language, but so be it (sorry!)
I think Maciej left out an "only" in that sentence. If you say "only C", then the sentence makes sense, even when applied to modules that *can* only be written in C (for CPython). That is, not having a Python version is bad. Necessary in many cases (or not worth the cost, for external library wrappers), but wouldn't it be nicer if it wasn't necessary?
FWIW, there is a proposed GSoC project that aims to implement a Cython backend for PyPy, either using ctypes or PyPy's own FFI. That would basically remove the need to write library wrappers in C for both CPython and PyPy, and eventually for IronPython, which also has a Cython port in the making. Not sure how Jython fits into this, but I wouldn't object to someone writing a JNI backend either. Stefan
Hi,
We try very hard to optimize for usual python idioms. They're very often much better than specific cpython hacks. Unless you mean things like rebiding a global into default a "pythonic idiom". We had to rewrite places in standard library which are precisely not very pythonic.
If I understand correctly, you’ve made internal changes preserving the official API of the modules. Have you reported those cases to bugs.python.org? I’m sure we’d be glad to incorporate those changes into the stdlib, possibly even in the stable branches if their rationale is strong enough. Regards
On Mon, Apr 18, 2011 at 6:32 PM, Éric Araujo
Hi,
We try very hard to optimize for usual python idioms. They're very often much better than specific cpython hacks. Unless you mean things like rebiding a global into default a "pythonic idiom". We had to rewrite places in standard library which are precisely not very pythonic.
If I understand correctly, you’ve made internal changes preserving the official API of the modules. Have you reported those cases to bugs.python.org? I’m sure we’d be glad to incorporate those changes into the stdlib, possibly even in the stable branches if their rationale is strong enough.
I think what's relevant was merged by benjamin. Usually: * we do revert things that were specifically made to make cpython faster, like def f(_getattr=getattr): ... * we usually target CPython version that's already frozen, which is pretty inconvinient to post this changes back. Example would be a socket module where it has changed enough in 3.x that 2.7 changes make no sense.
Regards
On Apr 18, 2011, at 10:11 AM, Maciej Fijalkowski wrote:
* we usually target CPython version that's already frozen, which is pretty inconvinient to post this changes back. Example would be a socket module where it has changed enough in 3.x that 2.7 changes make no sense.
Do you have any thoughts on the problem with the concrete C API not working well with subclasses of builtin types? I'm thinking that the PEP should specifically ban the practice of using the concrete api unless it is known for sure that an object is an exact type match. It is okay to write PyList_New() followed by PyList_SetItem() but not okay to use PyList_SetItem() on a user supplied argument that is known to be a subclass of list. A fast path can be provided for an exact path, but there would also need to a be a slower path that either converts the object to an exact list or that uses PyObject_SetItem(). In the discussions about this topic, there doesn't seem to be any technical solutions; instead, it will take a social solution such as a PEP and clear warnings in the docs. Raymond
Raymond Hettinger, 18.04.2011 19:26:
On Apr 18, 2011, at 10:11 AM, Maciej Fijalkowski wrote:
* we usually target CPython version that's already frozen, which is pretty inconvinient to post this changes back. Example would be a socket module where it has changed enough in 3.x that 2.7 changes make no sense.
Do you have any thoughts on the problem with the concrete C API not working well with subclasses of builtin types?
I'm thinking that the PEP should specifically ban the practice of using the concrete api unless it is known for sure that an object is an exact type match.
Absolutely.
It is okay to write PyList_New() followed by PyList_SetItem() but not okay to use PyList_SetItem() on a user supplied argument that is known to be a subclass of list. A fast path can be provided for an exact path, but there would also need to a be a slower path that either converts the object to an exact list or that uses PyObject_SetItem().
For what it's worth, Cython generates code that contains optimistic optimisations for common cases, such as iteration, x.append() calls, etc. When it finds such a pattern, it generates separate code paths for the most likely (builtin type) case and a slower fallback for the more unlikely case of a user provided type. So you get both speed and compatibility for free, just by writing idiomatic code like "for item in some_iterable". Stefan
Maciej Fijalkowski, 18.04.2011 19:11:
On Mon, Apr 18, 2011 at 6:32 PM, Éric Araujo wrote:
We try very hard to optimize for usual python idioms. They're very often much better than specific cpython hacks. Unless you mean things like rebiding a global into default a "pythonic idiom". We had to rewrite places in standard library which are precisely not very pythonic.
If I understand correctly, you’ve made internal changes preserving the official API of the modules. Have you reported those cases to bugs.python.org? I’m sure we’d be glad to incorporate those changes into the stdlib, possibly even in the stable branches if their rationale is strong enough.
I think what's relevant was merged by benjamin. Usually:
* we do revert things that were specifically made to make cpython faster, like
def f(_getattr=getattr): ...
Thanks. Speaking for the Cython project, we are certainly happy to see these micro optimisations reverted. Makes our life easier and the generated code faster. Stefan
Brett Cannon
Now if people would actually support simply not accepting any more C modules into the Python stdlib (this does not apply to CPython's stdlib), then I'm all for that. I only went with the "accelerator modules are okay" route to help get acceptance for the PEP. But if people are willing to go down a more stringent route and say that any module which uses new C code is considered CPython-specific and thus any acceptance of such modules will be damn hard to accomplish as it will marginalize the value of the code, that's fine by me.
Could you explain why C code marginalizes the value of the code? Most people use CPython and they definitely want fast C modules. Also, many people actually use CPython specifically for its C-API. It has been suggested recently that wrapping the ICU library would be desirable for Python. Should all such projects be discouraged because it does not benefit PyPy, Jython and IronPython? I find these projects very interesting and wish them well, but IMO the reality is that CPython will continue to be the dominant player for at least another 10 years. Stefan Krah
Brett Cannon wrote:
In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
How long does that silence have to last? I didn't notice a definition of what counts as "100% branch coverage". Apologies if I merely failed to notice it, but I think it should be explicitly defined. Presumably it means that any time you have an explicit branch (if...elif...else, try...except...else, for...else, etc.) you need a test that goes down each branch. But it isn't clear to me whether it's sufficient to test each branch in isolation, or whether you need to test all combinations. That is, if you have five branches, A or B, C or D, E or F, G or H, I or J, within a single code unit (function? something else?), is it sufficient to have at least one test that goes down each of A...J, or do you need to explicitly test each of: A-C-E-G-I A-C-E-G-J A-C-E-H-I A-C-E-H-J A-C-F-G-I ... B-D-F-H-J (10 tests versus 32 tests). If the latter, this could become impractical *very* fast. But if not, I don't see how we can claim 100% coverage when there are code paths which are never tested. At the very least, I think you need to explicitly define what you mean by "100% branch coverage". Possibly this will assist in the disagreement between you and Antoine re "100% versus "comprehensive" coverage. -- Steven
On 16 Apr, 11:03 pm, steve@pearwood.info wrote:
Brett Cannon wrote:
In the grand python-dev tradition of "silence means acceptance", I consider this PEP finalized and implicitly accepted.
How long does that silence have to last?
I didn't notice a definition of what counts as "100% branch coverage". Apologies if I merely failed to notice it, but I think it should be explicitly defined.
Presumably it means that any time you have an explicit branch (if...elif...else, try...except...else, for...else, etc.) you need a test that goes down each branch. But it isn't clear to me whether it's sufficient to test each branch in isolation, or whether you need to test all combinations.
That is, if you have five branches, A or B, C or D, E or F, G or H, I or J, within a single code unit (function? something else?), is it sufficient to have at least one test that goes down each of A...J, or do you need to explicitly test each of:
A-C-E-G-I A-C-E-G-J A-C-E-H-I A-C-E-H-J A-C-F-G-I ... B-D-F-H-J
(10 tests versus 32 tests).
If the latter, this could become impractical *very* fast. But if not, I don't see how we can claim 100% coverage when there are code paths which are never tested.
The mostly commonly used definition of branch coverage is that each outcome of each individual branch is executed, not that all possible combinations of all branches in a unit are executed. I haven't heard anyone in this thread propose the latter, only the former. "100% coverage" by itself is certainly ambiguous.
At the very least, I think you need to explicitly define what you mean by "100% branch coverage". Possibly this will assist in the disagreement between you and Antoine re "100% versus "comprehensive" coverage.
I suspect that everyone who has said "branch coverage" in this thread has intended the definition given above (and encourage anyone who meant something else to clarify their position). Jean-Paul
Brett Cannon
Since they do not typically support the entire `C API of Python`_ they are unable to use the code used to create the module. Often times this leads these other VMs to either re-implement the modules in pure Python or in the programming language used to implement the VM (e.g., in C# for IronPython). This duplication of effort between CPython, PyPy, Jython, and IronPython is extremely unfortunate as implementing a module *at least* in pure Python would help mitigate this duplicate effort.
The purpose of this PEP is to minimize this duplicate effort by mandating that all new modules added to Python's standard library *must* have a pure Python implementation _unless_ special dispensation is given. This makes sure that a module in the stdlib is available to all VMs and not just to CPython (pre-existing modules that do not meet this requirement are exempt, although there is nothing preventing someone from adding in a pure Python implementation retroactively).
I'm not sure that I understand the duplication of effort: If there is a C module without a Python implementation in the stdlib, then the PyPy, Jython, and IronPython developers are free to cooperate and implement a single Python version. I would not consider this a duplication of effort. If, on the other hand, they choose to provide three individual implementations in C#, Java and (?), then that is their own choice and surely not the fault of the C module developer. By contrast, this PEP puts a great burden on the developers of new C modules. If this PEP is accepted, it is the C module developers who will have to do duplicate work. In my view, the PEP should have a clause that *active* participation of PyPy, Jython, and IronPython developers is expected if they want pure compatible Python versions to exist.
Re-implementing parts (or all) of a module in C (in the case of CPython) is still allowed for performance reasons, but any such accelerated code must pass the same test suite (sans VM- or C-specific tests) to verify semantics and prevent divergence. To accomplish this, the test suite for the module must have 100% branch coverage of the pure Python implementation before the acceleration code may be added.
Raymond has pointed out that the PEP seems to discourage C modules. This is one of the examples. Since implementing C modules takes a lot of time, I'd appreciate to know if they are just tolerated or actually welcome.
As an example, to write tests which exercise both the pure Python and C accelerated versions of a module, a basic idiom can be followed::
[cut]
heap = Spam() self.assertFalse(isinstance(heap, collections.abc.MutableSequence)) with self.assertRaises(TypeError): self.heapq.heappop(heap)
If all possible exceptions must match, then in the case of decimal the PEP should give permission to change the published API of an existing Python module (in this case decimal.py). Otherwise, I see no way of accomplishing this goal. It is possible to give many frivolous examples:
from decimal import *
class C(): ... def __init__(self): ... self.traps = 'invalid' ... # No exception ... setcontext(C())
from cdecimal import * class C(): ... def __init__(self): ... self.traps = 'invalid' ... setcontext(C()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: argument must be a context.
In the case of duck typing, the only solution I see is to lock down the types in decimal.py, thus changing the API. This is one of the things that should be decided *before* the PEP is accepted. Stefan Krah
On Sun, 17 Apr 2011 12:14:51 +0200, Stefan Krah
I'm not sure that I understand the duplication of effort: If there is a C module without a Python implementation in the stdlib, then the PyPy, Jython, and IronPython developers are free to cooperate and implement a single Python version. I would not consider this a duplication of effort.
Yes, that's exactly what we are trying to encourage. If the Python standard library is seen as common property of all Python implementations, then this is *much* more likely to happen.
If, on the other hand, they choose to provide three individual implementations in C#, Java and (?), then that is their own choice and surely not the fault of the C module developer.
Right.
By contrast, this PEP puts a great burden on the developers of new C modules. If this PEP is accepted, it is the C module developers who will have to do duplicate work.
This is true only because of the current "blessed" position of CPython in the Python ecosystem. If a separate Python stdlib is the common property of all Python implementations, then the same double burden would apply to, say, an IronPython developer writing a module in C# and wanting it included in the stdlib.
In my view, the PEP should have a clause that *active* participation of PyPy, Jython, and IronPython developers is expected if they want pure compatible Python versions to exist.
Re-implementing parts (or all) of a module in C (in the case of CPython) is still allowed for performance reasons, but any such accelerated code must pass the same test suite (sans VM- or C-specific tests) to verify semantics and prevent divergence. To accomplish this, the test suite for the module must have 100% branch coverage of the pure Python implementation before the acceleration code may be added.
Raymond has pointed out that the PEP seems to discourage C modules. This is one of the examples. Since implementing C modules takes a lot of time, I'd appreciate to know if they are just tolerated or actually welcome.
I believe they are welcome, but that they are a CPython implementation detail, and the PEP is trying to make that distinction clear. One can also imagine a C module getting accepted to the stdblib because everybody agrees that (a) it can't be implemented in Python and (b) every Python implementation should support it. In that case only the test suite will be part of the implementation-independent part of the stdlib. I do think that such modules (and we already have several) should have a higher bar to cross to get in to the stdlib than modules that have a pure Python implementation.
If all possible exceptions must match, then in the case of decimal the PEP should give permission to change the published API of an existing Python module (in this case decimal.py). Otherwise, I see no way of accomplishing this goal.
This may well be what needs to be done, both for CPython and for other implementations. When we agree that some test covers something that is an implementation detail, the tests should be so marked. Making changes to the API and tests to accommodate specific Python implementations (including CPython) will be the right thing to do in some cases. Obviously these will have to be considered on a case by case basis. The Python sdtlib and its tests is already the standard that other implementations need to conform to. The PEP is trying to lay out some rules so that CPython has to conform on equal footing with the other implementations.
It is possible to give many frivolous examples:
from decimal import *
class C(): ... def __init__(self): ... self.traps = 'invalid' ...
# No exception ... setcontext(C())
from cdecimal import * class C(): ... def __init__(self): ... self.traps = 'invalid' ...
setcontext(C()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: argument must be a context.
In the case of duck typing, the only solution I see is to lock down the types in decimal.py, thus changing the API. This is one of the things that should be decided *before* the PEP is accepted.
Here you perceive the burden we are currently placing on the other implementations. That's the world they live in *now*. The PEP is asking CPython to share this pain equally. I agree that this is a concrete example that the PEP could address. I myself don't know enough about decimal/cdecimal or the Python C API to know why cdecimal can't duck type here, but it certainly sounds like a good example to use to clarify the requirements being advocated by the PEP. I won't be surprised to find that the issues involved are the same issues that an accelerator module for the other Python implementations would face. -- R. David Murray http://www.bitdance.com
R. David Murray
In the case of duck typing, the only solution I see is to lock down the types in decimal.py, thus changing the API. This is one of the things that should be decided *before* the PEP is accepted.
Here you perceive the burden we are currently placing on the other implementations. That's the world they live in *now*. The PEP is asking CPython to share this pain equally.
I agree that this is a concrete example that the PEP could address. I myself don't know enough about decimal/cdecimal or the Python C API to know why cdecimal can't duck type here, but it certainly sounds like a good example to use to clarify the requirements being advocated by the PEP. I won't be surprised to find that the issues involved are the same issues that an accelerator module for the other Python implementations would face.
The technical reason is that the context is a speed critical data structure, so I'm doing some tricks to emulate the context flags and traps dictionaries. But I actually prefer that the context is locked down. The context settings are absolutely crucial for the correctness of the result. Here is a mistake that I've made multiple times while trying something out with decimal.py:
from decimal import * c = getcontext() # Meaning c.Emax and c.Emin: c.emax = 99 c.emin = -99 # The operation silently uses the unchanged context: Decimal(2)**99999 Decimal('4.995010465071922539720163822E+30102')
cdecimal raises an AttributeError:
from cdecimal import * c = getcontext() c.emax = 99 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'cdecimal.Context' object has no attribute 'emax'
So, if one of the goals of the PEP is to clean up various APIs, I'm all for it. My concern is though that the process will be very slow due to lack of time and general reluctance to change APIs. And this is where I see a potentially negative effect: Is it worth to stall development over relatively minor issues? Will these differences actually affect someone in practice? Will the four Python implementations block each other? Stefan Krah
On Sun, 17 Apr 2011 19:17:11 +0200, Stefan Krah
R. David Murray
wrote: [snip a lot] Thank you, this cleared up many things.
Heh. Keep in mind that this is my viewpoint. I *think* Brett agrees with me. I'm sure he'll speak up if he doesn't.
The technical reason is that the context is a speed critical data structure, so I'm doing some tricks to emulate the context flags and traps dictionaries.
[snip] Thanks, your explanation seems to me to make a good case for making the decimal.py implementation less permissive.
So, if one of the goals of the PEP is to clean up various APIs, I'm all for it. My concern is though that the process will be very slow due to lack of time and general reluctance to change APIs. And this is where I see a potentially negative effect:
Well, the general reluctance to change APIs is certainly an issue. But since you are advocating cdecimal changing the API *anyway*, if it is going to go in to CPython this would have to be addressed regardless. So I don't see that the PEP affects the speed of that part of the process from CPython's point of view.
Is it worth to stall development over relatively minor issues? Will these differences actually affect someone in practice? Will the four Python implementations block each other?
In my vision it wouldn't stall development in any place it shouldn't be stalled by our normal backward compatibility rules. It would be a bug in the bug tracker saying "the API of module X has some undesirable characteristics that get in the way of implementing accelerators, can we change it?" Again, I don't see this as changing what the current procedure should be anyway, just clarifying it and making it more likely that we will *notice* the changes and deal with them proactively rather than finding out about them after the accelerator is in the field, having introduced a backward-incompatible change unintentionally. (Note: I'm sure that we will still accidentally do this anyway, I'm just hoping to reduce the frequency of such occurrences). -- R. David Murray http://www.bitdance.com
On Mon, Apr 18, 2011 at 3:50 AM, R. David Murray
Thanks, your explanation seems to me to make a good case for making the decimal.py implementation less permissive.
Indeed. Since the current handling of Context in decimal.py violates "Errors should never pass silently, unless explicitly silenced", I would personally support a proposal to lock down its __setattr__ to a predefined set of attributes, have its __delattr__ always raise an exception, and introduce a parent ABC that is used for an isinstance() check in setcontext(). (The ABC could include an attribute check, so only objects that failed to provide all the appropriate methods and attributes would raise the TypeError). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Brett Cannon, 05.04.2011 01:46:
At both the VM and language summits at PyCon this year, the issue of compatibility of the stdlib amongst the various VMs came up. Two issues came about in regards to modules that use C code. One is that code that comes in only as C code sucks for all other VMs that are not CPython since they all end up having to re-implement that module themselves. Two is that modules that have an accelerator module (e.g., heapq, warnings, etc.) can end up with compatibility options (sorry, Raymond, for picking on heapq, but is was what bit the PyPy people most recently =).
In lieu of all of this, here is a draft PEP to more clearly state the policy for the stdlib when it comes to C code. Since this has come up before and this was discussed so much at the summits I have gone ahead and checked this in so that even if this PEP gets rejected there can be a written record as to why.
And before anyone asks, I have already run this past the lead devs of PyPy, Jython, and IronPython and they all support what this PEP proposes. And with the devs of the other VMs gaining push privileges there shouldn't be an added developer burden on everyone to make this PEP happen.
This PEP has received a lengthy discussion by now, so here's why I think it's being fought so heavily by several CPython core developers, specifically those who have traditionally carried a large part of the optimisation load in the project. I think the whole point of this PEP is that, having agreed that a shared standard library for all Python implementations is a good thing, the amount of shareable code should be maximised. I doubt that anyone will argue against this goal. But that obviously includes all sides. If other implementations are free to cherry pick the targets of their own effort geared by the optimisation of their own implementation, and leave the whole burden of compatibility and code reusability on CPython, in addition to the CPython efforts of improving and optimising its own core code base and its own stdlib version, it's not an equal matter. That's what makes the PEP feel so unfair to CPython developers, because they are the ones who carry most of the burden of maintaining the stdlib in the first place, and who will most likely continue to carry it, because other implementations will continue to be occupied with their own core development for another while or two. It is nice to read that other implementations are contributing back patches that simplify their own reuse of the stdlib code. However, that does not yet make them equal contributors to the development and the maintenance of the stdlib, and is of very little worth to the CPython project. It often even runs counter to the interest of CPython itself. I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point. Stefan
On Tue, Apr 19, 2011 at 3:06 PM, Stefan Behnel
I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point.
We've made a start on that aspect by granting CPython access to several of the core developers on the other VMs. The idea being that they can update the pure Python versions of modules directly rather than having to wait for one of us to do it on their behalf. Of course, as Maciej pointed out, that is currently hindered by the fact that the other VMs aren't targeting 3.3 yet, and that's where the main CPython development is happening. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan, 19.04.2011 10:57:
On Tue, Apr 19, 2011 at 3:06 PM, Stefan Behnel wrote:
I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point.
We've made a start on that aspect by granting CPython access to several of the core developers on the other VMs. The idea being that they can update the pure Python versions of modules directly rather than having to wait for one of us to do it on their behalf.
Of course, as Maciej pointed out, that is currently hindered by the fact that the other VMs aren't targeting 3.3 yet, and that's where the main CPython development is happening.
A related question is: when other Python VM projects try to port a given C module, would they actually invest the time to write a pure Python version that may or may not run within acceptable performance bounds for them, or would they prefer saving time by writing only a native implementation directly for their VM for performance reasons? Maybe both, maybe not. If they end up writing a native version after prototyping in Python, is the prototype worth including in the shared stdlib, even if its performance is completely unacceptable for everyone? Or, if they write a partial module and implement another part of it natively, would the incomplete implementation qualify as a valid addition to the shared stdlib? Implementing a 100% compatible and "fast enough" Python version of a module is actually a rather time consuming task. I think we are expecting some altruism here that is easily sacrificed for time constraints, in any of the Python VM projects. CPython is just in the unlucky position of representing the status-quo. Stefan
On Tue, Apr 19, 2011 at 12:01 PM, Stefan Behnel
Nick Coghlan, 19.04.2011 10:57:
On Tue, Apr 19, 2011 at 3:06 PM, Stefan Behnel wrote:
I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point.
We've made a start on that aspect by granting CPython access to several of the core developers on the other VMs. The idea being that they can update the pure Python versions of modules directly rather than having to wait for one of us to do it on their behalf.
Of course, as Maciej pointed out, that is currently hindered by the fact that the other VMs aren't targeting 3.3 yet, and that's where the main CPython development is happening.
A related question is: when other Python VM projects try to port a given C module, would they actually invest the time to write a pure Python version that may or may not run within acceptable performance bounds for them, or would they prefer saving time by writing only a native implementation directly for their VM for performance reasons? Maybe both, maybe not. If they end up writing a native version after prototyping in Python, is the prototype worth including in the shared stdlib, even if its performance is completely unacceptable for everyone? Or, if they write a partial module and implement another part of it natively, would the incomplete implementation qualify as a valid addition to the shared stdlib?
At least from our (PyPy's side), we do use pure python versions a lot. Their performance vary, but sometimes you don't care, you just want the module to work. Contrary to popular belief, not all code is performance critical in standard library. We got quite far without even looking. Later on we usually look there, but for us rewriting it in RPython most of the time makes no sense, since pure python code might even behave better than RPython code, especially if there are loops which get jitted more efficiently if they're in pure python.
Implementing a 100% compatible and "fast enough" Python version of a module is actually a rather time consuming task. I think we are expecting some altruism here that is easily sacrificed for time constraints, in any of the Python VM projects. CPython is just in the unlucky position of representing the status-quo.
I think 100% compatible with whatever performance is already a lot for us. We can improve the performance later on. For example we never touched heapq module and it works just fine as it is.
Stefan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On Tue, 19 Apr 2011 12:01:44 +0200, Stefan Behnel
A related question is: when other Python VM projects try to port a given C module, would they actually invest the time to write a pure Python version that may or may not run within acceptable performance bounds for them, or would they prefer saving time by writing only a native implementation directly for their VM for performance reasons? Maybe both, maybe not. If they end up writing a native version after prototyping in Python, is the prototype worth including in the shared stdlib, even if its performance is completely unacceptable for everyone? Or, if they write a partial module and implement another part of it natively, would the incomplete implementation qualify as a valid addition to the shared stdlib?
I would say yes, it is worth including. And even more worth including is any additional tests they develop to validate their implementation.
Implementing a 100% compatible and "fast enough" Python version of a module is actually a rather time consuming task. I think we are expecting some altruism here that is easily sacrificed for time constraints, in any of the Python VM projects. CPython is just in the unlucky position of representing the status-quo.
Well, I don't think we are really expecting altruism. We're trying to leverage the work the community is doing, by drawing as much of the Python code and validation tests that get created into a common stdlib. If a module in the wild is being considered for inclusion in the stdlib, it will need to have a Python version if practical. Since we accept so few modules anyway (for good reason), I really don't see this as a big deal. And, there's always the practicality beats purity argument: if the PEP turns out to really get in the way of something everyone wants, then we can agree to an exception. -- R. David Murray http://www.bitdance.com
On Tue, Apr 19, 2011 at 10:57 AM, Nick Coghlan
On Tue, Apr 19, 2011 at 3:06 PM, Stefan Behnel
wrote: I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point.
We've made a start on that aspect by granting CPython access to several of the core developers on the other VMs. The idea being that they can update the pure Python versions of modules directly rather than having to wait for one of us to do it on their behalf.
Of course, as Maciej pointed out, that is currently hindered by the fact that the other VMs aren't targeting 3.3 yet, and that's where the main CPython development is happening.
We're also slightly hindered by the fact that not all of us got privilages so far (Antonio Cuni in particular).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On Tue, Apr 19, 2011 at 8:17 AM, Maciej Fijalkowski
On Tue, Apr 19, 2011 at 10:57 AM, Nick Coghlan
wrote: On Tue, Apr 19, 2011 at 3:06 PM, Stefan Behnel
wrote: I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point.
We've made a start on that aspect by granting CPython access to several of the core developers on the other VMs. The idea being that they can update the pure Python versions of modules directly rather than having to wait for one of us to do it on their behalf.
Of course, as Maciej pointed out, that is currently hindered by the fact that the other VMs aren't targeting 3.3 yet, and that's where the main CPython development is happening.
We're also slightly hindered by the fact that not all of us got privilages so far (Antonio Cuni in particular).
Yeah, I emailed him this morning, I dropped the ball on his commit bit post pycon due to email overload. I'm resolving it today.
On Tue, Apr 19, 2011 at 1:06 AM, Stefan Behnel
This PEP has received a lengthy discussion by now, so here's why I think it's being fought so heavily by several CPython core developers, specifically those who have traditionally carried a large part of the optimisation load in the project.
I think the whole point of this PEP is that, having agreed that a shared standard library for all Python implementations is a good thing, the amount of shareable code should be maximised. I doubt that anyone will argue against this goal.
But that obviously includes all sides. If other implementations are free to cherry pick the targets of their own effort geared by the optimisation of their own implementation, and leave the whole burden of compatibility and code reusability on CPython, in addition to the CPython efforts of improving and optimising its own core code base and its own stdlib version, it's not an equal matter.
I am going to go out on a limb here and state that once the stdlib is shared, it is all of the VM's responsibility to help maintaining it, meaning the PEP 399 is binding to all of the VMs. If Jython wants to write an accelerator module in Java for something in the stdlib, they have to follow the same guidelines, same applies to PyPy, etc. I think this is an equal matter, and if needed, we should make note of it in the PEP. The goal here is to make it easier to share the code base of the stdlib and not pull the rug out from other implementations by having a stdlib module written only in highly optimized C with no Python fallback, leaving them with the unsavory duty of reimplementing it in Python|Java|C#, etc. Pure Python is the coin of the realm.
That's what makes the PEP feel so unfair to CPython developers, because they are the ones who carry most of the burden of maintaining the stdlib in the first place, and who will most likely continue to carry it, because other implementations will continue to be occupied with their own core development for another while or two. It is nice to read that other implementations are contributing back patches that simplify their own reuse of the stdlib code. However, that does not yet make them equal contributors to the development and the maintenance of the stdlib, and is of very little worth to the CPython project. It often even runs counter to the interest of CPython itself.
Sure, at first glance this seems to place an unfair burden on CPython - because we're just as guilty as being "closed" to other implementation as the other implementations are to us. We're trying to change that, and someone (us, as the reference implementation) need to take the first responsible step. Once this move is made/accepted, I would expect the other implementation to rapidly move away from their custom implementations of the stdlib and contribute to the shared code base and documentation. Yes, this places a burden on CPython, but in the long term in benefits *all* of the projects equally by simply having more active contributors. We have over 200 stdlib modules, and far, far less than that in active developers focused or working on the stdlib. Making it a shared property (in theory) means that the other VMs have a shared interest in that property. We're effectively spreading the load.
I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point.
That's not going to happen. CPython will continue to do the bulk of the maintenance until we break it out, and the other implementations have time to adapt and pull in the shared code base. I don't see this as such a large burden as you seem to be making it out to be: CPython is the reference implementation, and our stdlib is the reference stdlib. We can break out the stdlib and share it amongst the implementations therefore making it more than the reference stdlib - we can make it the defacto stdlib for the language as a whole. We also, long term, want spread the maintenance load beyond CPython, but right now we are the primary caretakers, so yes - this adds load to us in the short term, but benefits us in the long term. jesse
Once this move is made/accepted, I would expect the other implementation to rapidly move away from their custom implementations of the stdlib and contribute to the shared code base and documentation. Yes, this places a burden on CPython, but in the long term in benefits *all* of the projects equally by simply having more active contributors.
I would also like to point out that some valuable contributions were made already by other implementations. When talking about stdlib, it's mostly in the area of test suite, but not only in terms of "skip those tests", but also improving test coverage and even fixing bugs. Unicode fixes were prototyped on PyPy first and some PyPy optimizations were ported to CPython (the original method cache patch came from Armin Rigo as far as I remember). So it's not completely "Cpython's burden" only. Cheers, fijal
On Tue, 19 Apr 2011 14:29:24 +0200, Maciej Fijalkowski
Once this move is made/accepted, I would expect the other implementation to rapidly move away from their custom implementations of the stdlib and contribute to the shared code base and documentation. Yes, this places a burden on CPython, but in the long term in benefits *all* of the projects equally by simply having more active contributors.
I would also like to point out that some valuable contributions were made already by other implementations. When talking about stdlib, it's mostly in the area of test suite, but not only in terms of "skip those tests", but also improving test coverage and even fixing bugs. Unicode fixes were prototyped on PyPy first and some PyPy optimizations were ported to CPython (the original method cache patch came from Armin Rigo as far as I remember). So it's not completely "Cpython's burden" only.
Yes, and you also need to keep in mind that several developers wear multiple hats, and contribute to CPython on a regular or semi-regular basis. It is also enlightening to look at the output of hg churn. The number of active CPython developers over the past year is not huge, and very few of them have spoken up in this thread. -- R. David Murray http://www.bitdance.com
On Tue, 19 Apr 2011 07:06:09 +0200, Stefan Behnel
That's what makes the PEP feel so unfair to CPython developers, because they are the ones who carry most of the burden of maintaining the stdlib in the first place, and who will most likely continue to carry it, because other implementations will continue to be occupied with their own core development for another while or two. It is nice to read that other implementations are contributing back patches that simplify their own reuse of the stdlib code. However, that does not yet make them equal contributors to the development and the maintenance of the stdlib, and is of very little worth to the CPython project. It often even runs counter to the interest of CPython itself.
So, the PEP makes the burden worse in that it requires that someone who works on a module with a C accelerator must make sure that any existing Python version and the C version stay in sync, and that *anyone* who wants to introduce a new module into the stdlib must make sure it has a Python version if that is practical. IMO both of these are policies that make sense for CPython even aside from the existence of other implementations: Python is easier to read and understand, so where practical we should provide a Python version of any module in the stdlib, for the benefit of CPython users. It doesn't sound like a great burden to me, but I'm not really qualified to judge, since I don't generally work on C code. Also, could you expand on "It often even runs counter to the interest of CPython itself"? I'm not seeing that, unless you are talking about the parameter-binding micro-optimization, which I think we discourage these days anyway.
I think this social problem of the PEP can only be solved if the CPython project stops doing the major share of the stdlib maintenance, thus freeing its own developer capacities to focus on CPython related improvements and optimisations, just like the other implementations currently do. I'm not sure we want that at this point.
Personally, I consider myself an stdlib maintainer: I only occasionally dabble in C code when fixing bugs that annoy me for some reason. I suppose that's why I'm one of the people backing this PEP. I think there are other CPython developers who might say the same thing. -- R. David Murray http://www.bitdance.com
On Tue, 19 Apr 2011 10:37:41 -0400
"R. David Murray"
On Tue, 19 Apr 2011 07:06:09 +0200, Stefan Behnel
wrote: That's what makes the PEP feel so unfair to CPython developers, because they are the ones who carry most of the burden of maintaining the stdlib in the first place, and who will most likely continue to carry it, because other implementations will continue to be occupied with their own core development for another while or two. It is nice to read that other implementations are contributing back patches that simplify their own reuse of the stdlib code. However, that does not yet make them equal contributors to the development and the maintenance of the stdlib, and is of very little worth to the CPython project. It often even runs counter to the interest of CPython itself.
So, the PEP makes the burden worse in that it requires that someone who works on a module with a C accelerator must make sure that any existing Python version and the C version stay in sync, and that *anyone* who wants to introduce a new module into the stdlib must make sure it has a Python version if that is practical. IMO both of these are policies that make sense for CPython even aside from the existence of other implementations: Python is easier to read and understand, so where practical we should provide a Python version of any module in the stdlib, for the benefit of CPython users.
It doesn't sound like a great burden to me, but I'm not really qualified to judge, since I don't generally work on C code.
I think it's ok. Our experience on the io module proves, I think, that's it's indeed useful to have a pure Python (pseudocode-like) implementation. Regards Antoine.
participants (19)
-
Antoine Pitrou
-
Brett Cannon
-
Eric Snow
-
exarkun@twistedmatrix.com
-
James Y Knight
-
Jesse Noller
-
Maciej Fijalkowski
-
Michael Foord
-
Nick Coghlan
-
Paul Moore
-
R. David Murray
-
Raymond Hettinger
-
skip@pobox.com
-
Stefan Behnel
-
Stefan Krah
-
Steven D'Aprano
-
Terry Reedy
-
Tres Seaver
-
Éric Araujo