searching string url
mwm at mired.org
Thu Jul 28 05:38:54 CEST 2005
googlinggoogler at hotmail.com writes:
> Anyway to the orginally replier - I wish it was homework ;-), that
> would mean I wouldnt be trying to find myself a job as a recent
> graduate... I decided to crawl something similar to the yellow pages
> (do you have them in the US?) for my select area and then find all
> pages corresponding to my ideal field of work, and grab their details
> into a txt file.
I'm actually working on a general framework for doing this kind of
thing. It's designed specifically for walking through a collection of
pages from a web-based search engine, applying extra criteria to the
results, and then running a bit of code on any that pass that check.
It works for one site, but my attempt to try it on a second site
turned up a fundamental flaw. My first site used full URLs for
everything, so I happily passed soup between various methods. The
second site used relative urls for everything, and it all broke.
> Trouble is I keep thinking of cool new bits to add, python truely is a
> beautifal language. Ideally would like to somehow write all the
> information into a word mail merge - but I think that requires more
Given a working scrape, the only extra work is how to get it into a
mail merge. That depends on your platform and the software you're
using to send the mail. Shouldn't be all that hard.
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list