ANN: HarvestMan 1.3.9

Anand Pillai
15 Jun 2004 01:55:48 -0700

HarvesMan is a multithreaded, highly customizable, web crawler(offline
browser) written in python. It features thread control, download
control using multiple rules, support for robot exclusion protocol,
multiple 'fetch levels', url filters etc. HarvestMan is written in a 
modular, object-oriented architecture. 

HarvestMan is hosted at, an interactive
Zope based web site. The website provides a bug tracker.

HarvestMan 1.3.9 is the latest release of HarvestMan. The following
features have been added.

1. Url and web site priorities, customizable by user
2. Support for html tidy to clean up web pages to prevent
   parser errors & hence download web sites with html pages
   that contain errors.
3. Reusable download thread groups.
4. Mixed Intranet/Internet downloads in same project.
5. A modified url caching algorithm based on last modification
   time of the url file.
6. Url generations & priorities based on them.
7. Many bugfixes.

HarvestMan is free to use and is released under the Open Software

Latest source code can be obtained from
http://harvestman/ .

A comprehensive list of changes is at .

FAQ: .

Direct Link (for the impatient):

Thank You!

-Anand B Pillai