Beginner: Portable Python, BeautifulSoup & ScrapeNFeed

Brian Brian.Mingus at colorado.edu
Wed Apr 15 03:09:19 EDT 2009


On Ubuntu:

sudo apt-get install python-pyrss2gen python-beautifulsoup # download
ScrapeNFeed

Python:
Not sure what's wrong with this but it's most of the code you'll need:
-----------
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
from PyRSS2Gen import RSSItem, Guid
import ScrapeNFeed
import re

url='http://jobs.spb.ca.gov/exams_title.cfm'
job_html = urlopen(url).read()
job_soup = BeautifulSoup(job_html)
jobs = job_soup.findAll('strong', text=re.compile('.*RESEARCH.*'))

class JobFeed(ScrapeNFeed.ScrapedFeed):
    def HTML2RSS(self, headers, body):
        items = [RSSItem(title=job,
                         link=url,
                         description=job_soup.h2.string.strip())
                 for job in jobs]

        self.addRSSItems(jobs)

JobFeed.load(job_soup.title.string.strip(),
             url,
             'jobs.rss',
             'jobs.pickle',
             managingEditor='',
             )






On Tue, Apr 14, 2009 at 4:17 PM, Joe Larson <joe at joelarson.com> wrote:

> Hello list!
>
> I am a Python Beginner. I thought a good beginning project would be to use
> the Portable Python environment http://www.portablepython.com/ with
> Beautiful Soup http://www.crummy.com/software/BeautifulSoup/ and Scrape
> 'N' Feed http://www.crummy.com/software/ScrapeNFeed/ to create and RSS
> feed of this page http://jobs.spb.ca.gov/exams_title.cfm - ideally
> filtering just for positions with the string 'Research Analyst'.
>
> In my day job I work on the Windows OS (hence the Portable Python) - at
> home I use Ubuntu but also carry Portable Ubuntu as well.
>
> I just wanted to shoot this to the list - see if anyone had any suggestions
> or tips. I'm reading O'Reilly's Learning Python and The Python Tutorial, but
> it's still very challenging as this is my first programming language. Thanks
> all! Sincerely ~ joelar
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090415/9a7c3e88/attachment.html>


More information about the Python-list mailing list