Google and Python

Alex Martelli aleax at mac.com
Mon Sep 24 17:40:04 CEST 2007


Bryan Olson <fakeaddress at nowhere.org> wrote:
   ...
> > YouTube (one of Google's most valuable properties) is essentially
> > all-Python (except for open-source infrastructure components such as
> > lighttpd).  Also, at Google I'm specifically "Uber Tech Lead, Production
> > Systems": while I can't discuss details, my main responsibilities relate
> > to various software projects that are part of our "deep infrastructure",
> > and our general philosophy there is "Python where we can, C++ where we
> > must". 
> 
> Good motto. So is most of Google's code base now in
> Python? About what is the ratio of Python code to C++
> code? Of course lines of code is kine of a bogus measure.
> Of all those cycles Google executes, about what portion
> are executed by a Python interpreter?

I don't have those numbers at hand, and if I did they would be
confidential: you know that Google doesn't release many numbers at all
about its operations, most particularly not about our production
infrastructure (not even, say, how many server we have, in how many data
centers, with what bandwidth, and so on).

Still, I wouldn't say that "most" of our codebase is in Python: there's
a lot of Java, a lot of C++, a lot of Python, a lot of Javascript (which
may not correspond to all that many "cycles Google executes" since the
main point of coding in Javascript is having it execute in the user's
browser, of course, but it's still code that gets developed, debugged,
deployed, maintained), and a lot of other languages including ones that
Google developed in-house such as
<http://labs.google.com/papers/sawzall.html> .


> > Python is definitely not "just a tiny little piece" nor (by a
> > long shot) used only for "scripting" tasks; 
> 
> Ah, sorry. I meant the choice of scripting language was
> a tiny little piece of Google's method of operation.

In the same sense in which other such technology choices (C++, Java,
what operating systems, what relational databases, what http servers,
and so on) are similarly "tiny pieces", maybe.  Considering the number
of technology choices that must be made, plus the number of other
choices that aren't directly about technology but, say, about
methodology (style guides for each language in use, mandatory code
reviews before committing to the shared codebase, release-engineering
practices, standards for unit-tests and other kinds of tests, and so on,
and so forth), one could defensibly make a case that each and every such
choice must of necessity be "but a tiny little piece" of the whole.

> "Scripting language" means languages such as Python,
> Perl, and Ruby.

A widespread terminology, but nevertheless a fundamentally bankrupt one:
when a language is used to develop an application, it's very misleading
to call it a "scripting language", as it implies that it's instead used
only to "script" something else.  When it comes time to decide which mix
of languages to use to develop a new application, it's important to
avoid being biased by having tagged some languages as "scripting" ones,
some (say Java) as "application" ones, others yet (say C++) as "system"
ones -- the natural subconscious process would be to say "well I'm
developing an X, I should use an X language, not a Y language or a Z
language", which is most likely to lead to wrong choices.


> > if the mutant space-eating
> > nanovirus should instantly stop the execution of all Python code, the
> > powerful infrastructure that has been often described as "Google's
> > secret weapon" would seize up.
> 
> And the essence of the Google way is to employ a lot of
> smart programmers to build their own software to run on
> Google's infrastructure. Choice of language is triva.

No, it's far from trivial, any more than choice of operating system, and
so on.  Google is a technology company: exactly which technologies to
use and/or develop for the various necessary tasks, far from being
trivial, is the very HEART of its operation.

Your ludicrous claim is similar to saying that the essence of a certain
hedge fund is to employ smart traders to make a lot of money by
sophisticated trades (so far so reasonable) and (here comes the idiocy)
"choice of currencies and financial instruments is trivia" (?!?!?!) --
it's the HEART of such a fund, to pick and choose which positions to
build, unwind, or sell-on, and which (e.g.) currencies should be
involved in such positions is obviously *crucial*, one of the many
important decisions those "smart traders" make every day, and far from
the least important of the many.  And similarly, OF COURSE, for choices
of technologies (programming languages very important among those) for a
technology company, just like, say, what horticultural techniques and
chemicals to employ would be for a company whose "essence" was
cultivating artichokes for sale on the market, and so on.


> I think both Python Google are great. What I find
> ludicrous is the idea that the bits one hears about how
> Google builds its software make a case for how others
> should build theirs.

To each his own, I guess: what I find ludicrous is your claim about
"trivia", as I explained above.  To me, on the contrary, it seems
self-evident that if a company X enjoys great success employing
technique Y, this *DOES* make something of a case for another company Z
to seriously consider and probably try out Y, when attempting tasks
analogous to those X has had success with, to see if some of the success
could not be replicable in Z's own similar tasks.  This is the heart of
"benchmarking" and "industry best practices" -- and why many companies
in the role of X aren't all that forthcoming about publicizing all the
details of their Y's, just in case Z's endeavours should put Z in
competition with X (this always needs to be balanced with the many
_advantages_ connected to publicizing some of those Y's, of course).

Such empirical support, while of course far from infallible (one will
always have to take into consideration many details, and the devil is in
the details), tends to perform vastly better in supporting decision
making than purely abstract considerations bereft of any such empirical
underpinnings.

> Google is kind of secretive, and
> their ways are very much their own. Google's software
> is much more Googley than Pythonic.

Nevertheless, if "Python has been an important part of Google since the
beginning" (as my colleague Peter Norvig said well before I joined
Google, then Guido did, etc etc), then clearly being Pythonic can be *an
important part* (NOT "trivia"!!!) of being Googley, and it would be
seriously stupid to choose to ignore this crucial data point.  YouTube's
choice of Python, done well before anybody had even conceived of their
becoming part of Google one day, does seem to have served them
particularly well too (and they gave lots of details in their talk on
the subject at OSCON, some materials are at
<http://www.scribd.com/doc/244443/Supersising-YouTube-with-Python> and
you can search web and blogs for more), etc, etc.

One delightful part of working at Google is that top management is *NOT*
made up of pointy-haired beancounters who consider such issues as choice
of technologies "trivia" -- Eric Schmidt (the CEO) started his career by
coding "lex" (the lexical analyzer part of the yacc/lex combination),
Stu Feldman started his by writing "make" (still the best-known
semi-automated software-build approach), Urs Hölzle pioneered
just-in-time compilers, Udi Manber wrote a great book on algorithms
(using a somewhat Pascal-like pseudocode which however used indentation
to denote blocks;-), etc, etc.  They KNOW how important ("trivia"
indeed...!-) such choices are: that's part of what makes them great
leaders of passionate engineers -- they haven't and never will forget
their own engineering roots.


Alex



More information about the Python-list mailing list