Web Crawler - Python or Perl?
nick at craig-wood.com
Mon Jun 9 22:30:49 CEST 2008
disappearedng at gmail.com <disappearedng at gmail.com> wrote:
> I am currently planning to write my own web crawler. I know Python but
> not Perl, and I am interested in knowing which of these two are a
> better choice given the following scenario:
> 1) I/O issues: my biggest constraint in terms of resource will be
> bandwidth throttle neck.
> 2) Efficiency issues: The crawlers have to be fast, robust and as
> "memory efficient" as possible. I am running all of my crawlers on
> cheap pcs with about 500 mb RAM and P3 to P4 processors
> 3) Compatibility issues: Most of these crawlers will run on Unix
> (FreeBSD), so there should exist a pretty good compiler that can
> optimize my code these under the environments.
> What are your opinions?
Use python with twisted.
With a friend I wrote a crawler. Our first attempt was standard
python. Our second attempt was with twisted. Twisted absolutely blew
the socks off our first attempt - mainly because you can fetch 100s or
1000s of pages simultaneously, without threads.
Python with twisted will satisfy 1-3. You'll have to get your head
around its asynchronous nature, but once you do you'll be writing a
killer crawler ;-)
As for Perl - once upon a time I would have done this with perl, but I
wouldn't go back now!
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list