python simply not scaleable enough for google?

Fri Nov 13 22:15:09 EST 2009

On 2009-11-13, at 18:02, Robert Brown wrote:

> Common Lisp and Scheme were designed by people who wanted to write complicated
> systems on machines with a tiny fraction of the horsepower of current
> workstations.  They were carefully designed to be compiled efficiently, which
> is not the case with Python.  There really is a difference here.  Python the
> language has features that make fast implementations extremely difficult.

Not true. Common Lisp was designed primarily by throwing together all of the 
features in every Lisp implementation the design committee was interested in. 
Although the committee members were familiar with high-performance compilation, 
the primary impetus was to achieve a standardized language that would be acceptable
to the Lisp community. At the time that Common Lisp was started, there was still
some sentiment that Lisp machines were the way to go for performance.  

As for Scheme, it was designed primarily to satisfy an aesthetic of minimalism. Even
though Guy Steele's thesis project, Rabbit, was a Scheme compiler, the point here was
that relatively simple compilation techniques could produce moderately reasonable 
object programs. Chez Scheme was indeed first run on machines that we would nowadays
consider tiny, but so too was C++. Oh, wait, so was Python!

I would agree that features such as exec and eval hurt the speed of Python programs, 
but the same things do the same thing in CL and in Scheme. There is a mystique about
method dispatch, but again, the Smalltalk literature has dealt with this issue in the 
past. 

Using Python 3 annotations, one can imagine a Python compiler that does the appropriate
thing (shown in the comments) with the following code. 

  import my_module                    # static linking

  __private_functions__ = ['my_fn']   # my_fn doesn't appear in the module dictionary.

  def my_fn(x: python.int32):         # Keeps x in a register
    def inner(z):                     # Lambda-lifts the function, no nonlocal vars
      return z // 2                   #   does not construct a closure
    y = x + 17                        # Via flow analysis, concludes that y can be registerized;
    return inner(2 * y)               # Uses inline integer arithmetic instructions. 

  def blarf(a: python.int32):
    return my_fn(a // 2)              # Because my_fn isn't exported, it can be inlined. 

A new pragma statement (which I am EXPLICITLY not proposing; I respect and support
the moratorium) might be useful in telling the implementation that you don't mind
integer overflow. 

Similarly, new library classes might be created to hold arrays of int32s or doubles. 

Obviously, no Python system does any of these things today. But there really is 
nothing stopping a Python system from doing any of these things, and the technology 
is well-understood in implementations of other languages. 

I am not claiming that this is _better_ than JIT. I like JIT and other runtime things
such as method caches better than these because you don't have to know very much about 
the implementation in order to take advantage of them. But there may be some benefit
in allowing programmers concerned with speed to relax some of Python's dynamism 
without ruining it for the people who need a truly dynamic language. 

If I want to think about scalability seriously, I'm more concerned about problems that
Python shares with almost every modern language: if you have lots of processors accessing 
a large shared memory, there is a real GC efficiency problem as the number of processors 
goes up. On the other hand, if you have a lot of processors with some degree of private 
memory sharing a common bus (think the Cell processor), how do we build an efficient 
implementation of ANY language for that kind of environment?

Somehow, the issues of Python seem very orthogonal to performance scalability. 

-- v