I think I know the answer to this, but I'm going to ask it anyway...
I know that there is a general policy of trying to write code in the standard library that does not disadvantage other implementations. How far does that go the other way? Should the standard library accept slower code because it will be much faster in other implementations?
Briefly, I have a choice of algorithm for the median function in the statistics module. If I target CPython, I will use a naive but simple O(N log N) implementation based on sorting the list and returning the middle item. (That's what the module currently does.) But if I target PyPy, I will use an O(N) algorithm which knocks the socks off the naive version even for smaller lists. In CPython that's typically 2-5 times slower; in PyPy it's typically 3-8 times faster, and the bigger the data set the more the advantage.
For the specific details, see http://bugs.python.org/issue21592
My feeling is that the CPython standard library should be written for CPython, that is, it should stick to the current naive implementation of median, and if PyPy wants to speed the function up, they can provide their own version of the module. I should *not* complicate the implementation by trying to detect which Python the code is running under and changing algorithms accordingly. However, I should put a comment in the module pointing at the tracker issue. Does this sound right to others?
Thanks,
Steven D'Aprano, 01.06.2014 10:11:
Briefly, I have a choice of algorithm for the median function in the statistics module. If I target CPython, I will use a naive but simple O(N log N) implementation based on sorting the list and returning the middle item. (That's what the module currently does.) But if I target PyPy, I will use an O(N) algorithm which knocks the socks off the naive version even for smaller lists. In CPython that's typically 2-5 times slower; in PyPy it's typically 3-8 times faster, and the bigger the data set the more the advantage.
For the specific details, see http://bugs.python.org/issue21592
My feeling is that the CPython standard library should be written for CPython, that is, it should stick to the current naive implementation of median, and if PyPy wants to speed the function up, they can provide their own version of the module.
Note that if you compile the module with Cython, CPython heavily benefits from the new implementation, too, by a factor of 2-5x. So there isn't really a reason to choose between two implementations because of the two runtimes, just use the new one for both and compile it for CPython. I added the necessary bits to the ticket.
Stefan
On 1 Jun 2014 18:13, "Steven D'Aprano" steve@pearwood.info wrote:
My feeling is that the CPython standard library should be written for CPython, that is, it should stick to the current naive implementation of median, and if PyPy wants to speed the function up, they can provide their own version of the module. I should *not* complicate the implementation by trying to detect which Python the code is running under and changing algorithms accordingly. However, I should put a comment in the module pointing at the tracker issue. Does this sound right to others?
One option is to set the pure Python module up to be paired with an accelerator module (and update the test suite accordingly), even if we *don't provide* an accelerator in CPython. That just inverts the more common case (where we have an accelerator written in C, but another implementation either doesn't need one, or just doesn't have one yet).
Cheers, Nick.
Thanks,
-- Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
Le 01/06/2014 10:11, Steven D'Aprano a écrit :
My feeling is that the CPython standard library should be written for CPython, that is, it should stick to the current naive implementation of median, and if PyPy wants to speed the function up, they can provide their own version of the module. I should *not* complicate the implementation by trying to detect which Python the code is running under and changing algorithms accordingly. However, I should put a comment in the module pointing at the tracker issue. Does this sound right to others?
It sounds ok to me.
Regards
Antoine.
On Jun 1, 2014, at 9:17 AM, Antoine Pitrou solipsis@pitrou.net wrote:
Le 01/06/2014 10:11, Steven D'Aprano a écrit :
My feeling is that the CPython standard library should be written for CPython, that is, it should stick to the current naive implementation of median, and if PyPy wants to speed the function up, they can provide their own version of the module. I should *not* complicate the implementation by trying to detect which Python the code is running under and changing algorithms accordingly. However, I should put a comment in the module pointing at the tracker issue. Does this sound right to others?
It sounds ok to me.
That makes sense.
Raymond
2014-06-01 10:11 GMT+02:00 Steven D'Aprano steve@pearwood.info:
My feeling is that the CPython standard library should be written for CPython,
Right. PyPy, Jython and IronPython already have their "own" standard library when they need a different implement.
PyPy: "lib_pypy" directory (lib-python is the CPython stdlib): https://bitbucket.org/pypy/pypy/src/ac52eb7bbbb059d0b8d001a2103774917cf7396f...
Jython: "Lib" directory (lib-python is the CPython stdlib): https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a...
IronPython: "IronPython.Modules" directory: http://ironpython.codeplex.com/SourceControl/latest#IronPython_Main/Language...
See for example the _fsum.py module of Jython: https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a...
Victor
On Mon, Jun 2, 2014 at 10:43 AM, Victor Stinner victor.stinner@gmail.com wrote:
2014-06-01 10:11 GMT+02:00 Steven D'Aprano steve@pearwood.info:
My feeling is that the CPython standard library should be written for CPython,
Right. PyPy, Jython and IronPython already have their "own" standard library when they need a different implement.
PyPy: "lib_pypy" directory (lib-python is the CPython stdlib): https://bitbucket.org/pypy/pypy/src/ac52eb7bbbb059d0b8d001a2103774917cf7396f...
it's for stuff that's in CPython implemented in C, not a reimplementation of python stuff. we patched the most obvious CPython-specific hacks, but it's a loosing battle, you guys will go way out of your way to squeeze extra 2% by doing very obscure hacks.
Jython: "Lib" directory (lib-python is the CPython stdlib): https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a...
IronPython: "IronPython.Modules" directory: http://ironpython.codeplex.com/SourceControl/latest#IronPython_Main/Language...
See for example the _fsum.py module of Jython: https://bitbucket.org/jython/jython/src/9cd9ab75eadea898e2e74af82ae414925d6a...
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
Maciej Fijalkowski, 02.06.2014 10:48:
On Mon, Jun 2, 2014 at 10:43 AM, Victor Stinner wrote:
2014-06-01 10:11 GMT+02:00 Steven D'Aprano steve@pearwood.info:
My feeling is that the CPython standard library should be written for CPython,
Right. PyPy, Jython and IronPython already have their "own" standard library when they need a different implement.
PyPy: "lib_pypy" directory (lib-python is the CPython stdlib): https://bitbucket.org/pypy/pypy/src/ac52eb7bbbb059d0b8d001a2103774917cf7396f...
it's for stuff that's in CPython implemented in C, not a reimplementation of python stuff. we patched the most obvious CPython-specific hacks, but it's a loosing battle, you guys will go way out of your way to squeeze extra 2% by doing very obscure hacks.
Thus my proposal to compile the modules in CPython with Cython, rather than duplicating their code or making/keeping them CPython specific. I think reducing the urge to reimplement something in C is a good thing.
Stefan
Stefan Behnel stefan_ml@behnel.de wrote:
Thus my proposal to compile the modules in CPython with Cython, rather than duplicating their code or making/keeping them CPython specific. I think reducing the urge to reimplement something in C is a good thing.
For algorithmic and numerical code, Numba has already proven that Python can be JIT compiled comparable to -O2 in C. For non-algorthmic code, the speed determinants are usually outside Python (e.g. the network connection). Numba is becoming what the "dead swallow" should have been. The question is rather should the standard library use a JIT compiler like Numba? Cython is great for writing C extensions while avoiding all the details of the Python C API. But for speeding up algorithmic code, Numba is easier to use.
Sturla
Sturla Molden, 03.06.2014 17:13:
Stefan Behnel wrote:
Thus my proposal to compile the modules in CPython with Cython, rather than duplicating their code or making/keeping them CPython specific. I think reducing the urge to reimplement something in C is a good thing.
For algorithmic and numerical code, Numba has already proven that Python can be JIT compiled comparable to -O2 in C. For non-algorthmic code, the speed determinants are usually outside Python (e.g. the network connection). Numba is becoming what the "dead swallow" should have been. The question is rather should the standard library use a JIT compiler like Numba? Cython is great for writing C extensions while avoiding all the details of the Python C API. But for speeding up algorithmic code, Numba is easier to use.
I certainly agree that a JIT compiler can do much better optimisations on Python code than a static compiler, especially data driven optimisations. However, Numba comes with major dependencies, even runtime dependencies.
From previous discussions on this list, I gathered that there are major
objections against adding such a large dependency to CPython since it can also just be installed as an external package if users want to have it.
Static compilation, on the other hand, is a build time thing that adds no dependencies that CPython doesn't have already. Distributions can even package up the compiled .so files separately from the original .py/.pyc files, if they feel like it, to make them selectively installable. So the argument in favour is mostly a pragmatic one. If you can have 2-5x faster code essentially for free, why not just go for it?
Stefan
Stefan Behnel stefan_ml@behnel.de wrote:
So the argument in favour is mostly a pragmatic one. If you can have 2-5x faster code essentially for free, why not just go for it?
I would be easier if the GIL or Cython's use of it was redesigned. Cython just grabs the GIL and holds on to it until it is manually released. The standard lib cannot have packages that holds the GIL forever, as a Cython compiled module would do. Cython has to start sharing access the GIL like the interpreter does.
Sturla
Sturla Molden, 03.06.2014 22:51:
Stefan Behnel wrote:
So the argument in favour is mostly a pragmatic one. If you can have 2-5x faster code essentially for free, why not just go for it?
I would be easier if the GIL or Cython's use of it was redesigned. Cython just grabs the GIL and holds on to it until it is manually released. The standard lib cannot have packages that holds the GIL forever, as a Cython compiled module would do. Cython has to start sharing access the GIL like the interpreter does.
Granted. This shouldn't be all that difficult to add as a special case when compiling .py (not .pyx) files. Properly tuning it (i.e. avoiding to inject the GIL release-acquire cycle in the wrong spots) may take a while, but that can be improved over time.
(It's not required in .pyx files because users should rather explicitly write "with nogil: pass" there to manually enable thread switches in safe and desirable places.)
Stefan
On Sun, Jun 01, 2014 at 06:11:39PM +1000, Steven D'Aprano wrote:
I think I know the answer to this, but I'm going to ask it anyway...
I know that there is a general policy of trying to write code in the standard library that does not disadvantage other implementations. How far does that go the other way? Should the standard library accept slower code because it will be much faster in other implementations?
[...]
Thanks to everyone who replied! I just wanted to make a brief note to say that although I haven't been very chatty in this thread, I have been reading it, so thanks for the advice, it is appreciated.