python simply not scaleable enough for google?

David Cournapeau cournape at gmail.com
Tue Nov 17 19:45:28 EST 2009


On Wed, Nov 18, 2009 at 8:31 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> David Cournapeau wrote:
>>
>> On Tue, Nov 17, 2009 at 10:48 PM, Aaron Watters <aaron.watters at gmail.com>
>> wrote:
>>>>
>>>> I don't think Python and Go address the same set of programmer
>>>> desires.  For example, Go has a static type system.  Some programmers
>>>> find static type systems to be useless or undesirable.  Others find
>>>> them extremely helpful and want to use them them.  If you're a
>>>> programmer who wants a static type system, you'll probably prefer Go
>>>> to Python, and vice versa.  That has nothing to do with implementation
>>>> speed or development expenditures.  If Google spent a million dollars
>>>> adding static types to Python, it wouldn't be Python any more.
>>>
>>> ... and I still have an issue with the whole "Python is slow"
>>> meme.  The reason NASA doesn't build a faster Python is because
>>> Python *when augmented with FORTRAN libraries that have been
>>> tested and optimized for decades and are worth billions of dollars
>>> and don't need to be rewritten* is very fast.
>>
>> It is a bit odd to dismiss "python is slow" by saying that you can
>> extend it with fortran.
>
> I find it a bit odd that people are so resistant to evaluating Python as it
> was designed to be. As Guido designed the language, he designed the
> implementation to be open and easily extended by assembler, Fortran, and C.

I am well aware of that fact - that's one of the major reason why I
decided to go the python route a few years ago instead of matlab,
because matlab C api is so limited.

> No one carps about the fact the dictionary key lookup, say, is writen in
> (optimized) C rather than pretty Python. Why should Basic Linear Algebra
> Subroutines (BLAS) be any different?

BLAS/LAPACK explicitly contains stuff that can easily be factored out
in a library. Linear algebra in general works well because the basic
data structures are well understood. You can deal with those as black
boxes most of the time (I for example have no idea how most of LAPACK
algo work, except for the simple ones). But that's not always the case
for numerical computations. Sometimes, you need to be able to go
inside the black box, and that's where python is sometimes limited for
me because of its cost.

To be more concrete, one of my area is speech processing/speech
recognition. Most of current engines are based on Hidden Markov
Models, and there are a few well known libraries to deal with those,
most of the time written in C/C++. You can wrap those in python (and
people do), but you cannot really use those unless you deal with them
at a high level. If you want to change some core algorithms (to deal
with new topology, etc....), you cannot do it without going into C. It
would be great to write my own HMM library in python, but I cannot do
it because it would be way too slow. There is no easy black-box which
I could wrap so that I keep enough flexibility without sacrificing too
much speed.

Said differently, I would be willing to be one order of magnitude
slower than say C, but not 2 to 3 as currently in python when you
cannot leverage existing libraries. When the code can be vectorized,
numpy and scipy give me this.

>> Relying on a lot of compiled libraries goes against it.
>
> On the contrary, Python could be optimized for human readability because it
> was expected that heavy computation would be delegated to other code. There
> is no need for scientists to read the optimized code in BLAS, LINPACK, and
> FFTPACK, in assembler, Fortran, and/or C, which are incorporated in Numpy.

I know all that (I am one of the main numpy develop nowadays), and
indeed, writing blas/lapack in python does not make much sense. I am
talking about libraries *I* would write. Scipy, for example, contains
more fortran and C code than python, without counting the libraries we
wrap, and a lot of it is because of speed/memory concern.

David



More information about the Python-list mailing list