[Python-Dev] Should standard library modules optimize for CPython?

Sun Jun 1 10:11:39 CEST 2014

I think I know the answer to this, but I'm going to ask it anyway...

I know that there is a general policy of trying to write code in the 
standard library that does not disadvantage other implementations. How 
far does that go the other way? Should the standard library accept 
slower code because it will be much faster in other implementations?

Briefly, I have a choice of algorithm for the median function in the 
statistics module. If I target CPython, I will use a naive but simple 
O(N log N) implementation based on sorting the list and returning the 
middle item. (That's what the module currently does.) But if I target 
PyPy, I will use an O(N) algorithm which knocks the socks off the naive 
version even for smaller lists. In CPython that's typically 2-5 times 
slower; in PyPy it's typically 3-8 times faster, and the bigger the data 
set the more the advantage.

For the specific details, see http://bugs.python.org/issue21592

My feeling is that the CPython standard library should be written for 
CPython, that is, it should stick to the current naive implementation of 
median, and if PyPy wants to speed the function up, they can provide 
their own version of the module. I should *not* complicate the 
implementation by trying to detect which Python the code is running 
under and changing algorithms accordingly. However, I should put a 
comment in the module pointing at the tracker issue. Does this sound 
right to others?

Thanks,

-- 
Steve