Web Crawler - Python or Perl?

subeen tamim.shahriar at gmail.com
Mon Jun 9 14:07:38 EDT 2008


On Jun 9, 11:48 pm, disappeare... at gmail.com wrote:
> Hi all,
> I am currently planning to write my own web crawler. I know Python but
> not Perl, and I am interested in knowing which of these two are a
> better choice given the following scenario:
>
> 1) I/O issues: my biggest constraint in terms of resource will be
> bandwidth throttle neck.
> 2) Efficiency issues: The crawlers have to be fast, robust and as
> "memory efficient" as possible. I am running all of my crawlers on
> cheap pcs with about 500 mb RAM and P3 to P4 processors
> 3) Compatibility issues: Most of these crawlers will run on Unix
> (FreeBSD), so there should exist a pretty good compiler that can
> optimize my code these under the environments.
>
> What are your opinions?

It really doesn't matter whether you use Perl or Python for writing
web crawlers. I have used both for writing crawlers. The scenarios you
mentioned (I/O issues, Efficiency, Compatibility) don't differ two
much for these two languages. Both the languages have fast I/O. You
can use urllib2 module and/or beautiful soup for developing crawler in
Python. For Perl you can use Mechanize or LWP modules. Both languages
have good support for regular expressions. Perl is slightly faster I
have heard, though I don't find the difference myself. Both are
compatible with *nix. For writing a good crawler, language is not
important, it's the technology which is important.

regards,
Subeen.
http://love-python.blogspot.com/



More information about the Python-list mailing list