[BangPypers] Harvestman error
Anand Balachandran Pillai
abpillai at gmail.com
Mon May 31 11:46:35 CEST 2010
On Sun, May 30, 2010 at 9:56 PM, JAGANADH G <jaganadhg at gmail.com> wrote:
> Dear All I was trying to run Harvestman(A Python tool for web harvesting).
> I got the following error
> http://pastebin.com/uPzUs0Xw
>
> My configuration file is http://pastebin.com/dfhiy2Q6
>
> Can any body help me regarding this.
>
> I was trying to harvest my blog with a word filter 'Python'
>
There is no word filter anymore. You hit upon a bug which seems to
still apply the word-filter code :)
For filtering based on words or regular expressions on the page content,
you can implement a custom crawler. It is pretty easy and a sample
already exists. Just modify the code to suit the keyword(s) you want
to filter.
Look for "searchingcrawler.py" inside apps/samples folder and
modify the code.
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>
--
--Anand
More information about the BangPypers
mailing list