python simply not scaleable enough for google?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Thu Nov 12 21:50:33 EST 2009


On Thu, 12 Nov 2009 21:02:11 +0100, Alf P. Steinbach wrote:

> Specifically, I reacted to the statement that <<it is sheer nonsense to
> talk about "the" speed of an implementation>>, made in response to
> someone upthread, in the context of Google finding CPython overall too
> slow.
> 
> It is quite slow. ;-)

Quite slow to do what? Quite slow compared to what?

I think you'll find using CPython to sort a list of ten million integers 
will be quite a bit faster than using bubblesort written in C, no matter 
how efficient the C compiler.

And why are we limiting ourselves to integers representable by the native 
C int? What if the items in the list were of the order of 2**100000? Of 
if they were mixed integers, fractions, fixed-point decimals, and 
floating-point binaries? How fast is your C code going to be now? That's 
going to depend on the C library you use, isn't it? In other words, it is 
an *implementation* issue, not a *language* issue.

Okay, let's keep it simple. Stick to numbers representable by native C 
ints. Around this point, people start complaining that it's not fair, I'm 
not comparing apples with apples. Why am I comparing a highly-optimized, 
incredibly fast sort method in CPython with a lousy O(N**2) algorithm in 
C? To make meaningful comparisons, you have to make sure the algorithms 
are the same, so the two language implementations do the same amount of 
work. (Funnily enough, it's "unfair" to play to Python's strengths, and 
"fair" to play to C's strengths.)

Then people invariable try to compare (say) something in C involving low-
level bit-twiddling or pointer arithmetic with something in CPython 
involving high-level object-oriented programming. Of course CPython is 
"slow" if you use it to do hundreds of times more work in every operation 
-- that's comparing apples with oranges again, but somehow people think 
that's okay when your intention is to prove "Python is slow".

An apples-to-apples comparison would be to use a framework in C which 
offered the equivalent features as Python: readable syntax ("executable 
pseudo-code"), memory management, garbage disposal, high-level objects, 
message passing, exception handling, dynamic strong typing, and no core 
dumps ever.

If you did that, you'd get something that runs much closer to the speed 
of CPython, because that's exactly what CPython is: a framework written 
in C that provides all those extra features.

(That's not to say that Python-like high-level languages can't, in 
theory, be significantly faster than CPython, or that they can't have JIT 
compilers that emit highly efficient -- in space or time -- machine code. 
That's what Psyco does, now, and that's the aim of PyPy.)

However, there is one sense that Python *the language* is slower than 
(say) C the language. Python requires that an implementation treat the 
built-in function (say) int as an object subject to modification by the 
caller, while C requires that it is a reserved word. So when a C compiler 
sees "int", it can optimize the call to a known low-level routine, while 
a Python compiler can't make this optimization. It *must* search the 
entire scope looking for the first object called 'int' it finds, then 
search the object's scope for a method called '__call__', then execute 
that. That's the rules for Python, and an implementation that does 
something else isn't Python. Even though the searching is highly 
optimized, if you call int() one million times, any Python implementation 
*must* perform that search one million times, which adds up. Merely 
identifying what function to call is O(N) at runtime for Python and O(1) 
at compile time for C.

Note though that JIT compilers like Psyco can often take shortcuts and 
speed up code by a factor of 2, or up to 100 in the best cases, which 
brings the combination of CPython + Psyco within shouting distance of the 
speed of the machine code generated by good optimizing C compilers. Or 
you can pass the work onto an optimized library or function call that 
avoids the extra work. Like I said, there is no reason for Python 
*applications* to be slow.


-- 
Steven



More information about the Python-list mailing list