Beginner: Portable Python, BeautifulSoup & ScrapeNFeed

Brian Brian.Mingus at colorado.edu
Sat Apr 18 08:56:04 CEST 2009


Was this code a complete waste of my time?

On Wed, Apr 15, 2009 at 1:09 AM, Brian <Brian.Mingus at colorado.edu> wrote:

> On Ubuntu:
>
> sudo apt-get install python-pyrss2gen python-beautifulsoup # download
> ScrapeNFeed
>
> Python:
> Not sure what's wrong with this but it's most of the code you'll need:
> -----------
> from urllib import urlopen
> from BeautifulSoup import BeautifulSoup
> from PyRSS2Gen import RSSItem, Guid
> import ScrapeNFeed
> import re
>
> url='http://jobs.spb.ca.gov/exams_title.cfm'
> job_html = urlopen(url).read()
> job_soup = BeautifulSoup(job_html)
> jobs = job_soup.findAll('strong', text=re.compile('.*RESEARCH.*'))
>
> class JobFeed(ScrapeNFeed.ScrapedFeed):
>     def HTML2RSS(self, headers, body):
>         items = [RSSItem(title=job,
>                          link=url,
>                          description=job_soup.h2.string.strip())
>                  for job in jobs]
>
>         self.addRSSItems(jobs)
>
> JobFeed.load(job_soup.title.string.strip(),
>              url,
>              'jobs.rss',
>              'jobs.pickle',
>              managingEditor='',
>              )
>
>
>
>
>
>
>
> On Tue, Apr 14, 2009 at 4:17 PM, Joe Larson <joe at joelarson.com> wrote:
>
>> Hello list!
>>
>> I am a Python Beginner. I thought a good beginning project would be to use
>> the Portable Python environment http://www.portablepython.com/ with
>> Beautiful Soup http://www.crummy.com/software/BeautifulSoup/ and Scrape
>> 'N' Feed http://www.crummy.com/software/ScrapeNFeed/ to create and RSS
>> feed of this page http://jobs.spb.ca.gov/exams_title.cfm - ideally
>> filtering just for positions with the string 'Research Analyst'.
>>
>> In my day job I work on the Windows OS (hence the Portable Python) - at
>> home I use Ubuntu but also carry Portable Ubuntu as well.
>>
>> I just wanted to shoot this to the list - see if anyone had any
>> suggestions or tips. I'm reading O'Reilly's Learning Python and The Python
>> Tutorial, but it's still very challenging as this is my first programming
>> language. Thanks all! Sincerely ~ joelar
>>  --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090418/7d8cd38f/attachment.html>


More information about the Python-list mailing list