web crawler in python or C?

Alex Martelli aleaxit at yahoo.com
Fri Feb 17 10:11:55 EST 2006


Ravi Teja <webraviteja at gmail.com> wrote:
   ...
> The rule of thumb for all your Python Vs C questions is ...
> 1.) Choose Python by default.

+1 QOTW!-)


> 2.) If your program is slow, it's your algorithm that you need to check

Seriously: yes, and (often even more importantly) data structure.

However, often most important tip, particularly for large-scale systems,
is to consider your program's _architecture_ (algorithms are about
details of computation, architecture is about partitioning systems into
components, locating their deployment, and so forth). At a generic and
lowish level: are you for example creating a lot of threads each for a
small amount of work? Then consider reusing threads from a "worker
threads" pool. Or maybe you could avoid threads and use event-driven
programming; or, at the other extreme, have multiple processes
communicating by TCP/IP so you can scale up your system to tens or
hundreds of processors -- in the latter case, partitioning your system
appropriately to minimize inter process communication may be the
bottleneck. Consider UDP, when you can afford missing a packet once in a
while -- sometimes it may let you reduce overheads compared to TCP
connections.

Database connections, and less importantly database cursors, are well
worth reusing. What are you "caching", and what instead is getting
recomputed over and over?  It's possible to undercache (needless
repeated computation) but also to overcache (tying up memory and causing
paging). Are you making lots of system calls that you might be able to
avoid? Each system call has a context-switching cost, after all...

Any or all of these hints may be irrelevant to a specific category of
applications, but then, so can the hint about algorithms be. One cool
thing about Python is that it makes it easy and fast for you to try out
different approaches (particularly to architecture, but to algorithms as
well), even drastically different ones, when simple reasoning about the
issues leaves you undecided and you need to settle them empirically.

 
> Remember Donald Knuth's quote.
> "Premature optimization is the root of all evil in programming".

I believe Knuth himself said he was quoting Tony Hoare, and indeed
referred to this as "Hoare's dictum".


Alex



More information about the Python-list mailing list