[Chicago] Parsing Metra's Online Schedule

Feihong Hsu hsu.feihong at yahoo.com
Sat Apr 12 00:02:11 CEST 2008


I just want to clarify that the lxml dependency is a rather weak one.
All of the schedule data is contained within <PRE> tags, with minimal
markup. So the majority of the actual parsing uses regular
expressions. I started the parser in lxml because I thought I was
going to have to use a ton of XPath, which turned out not to be the
case.

But yeah, Metra hardly ever changes their schedules.

--- Cosmin Stejerean <cstejerean at gmail.com> wrote:

> > >
> > > P.S. You need lxml to run the code.
> > >
> >
> > If  there is any intention of running this on App Engine, you
> won't be able
> > to do that.  One thing that people are a little disappointed
> about App
> > Engine is your python libraries have to be pure python and can't
> have C
> > extensions, so this might fail (unless you get lucky and lxml is
> installed
> > on the linux distribution that they're using for the App Engine
> servers, but
> > my guess is probably not).
> >
> > Fortunately this looks like it'd be easy to convert to use
> BeautifulSoup,
> > which is pure Python.
> >
> 
> Metra updates their schedule once a year, if that. We don't need to
> have the code on App Engine parse the metra data live. We just need
> to
> parse it and load it into the app. And keep the scripts around so
> we
> can parse it again when it changes. Good point about the pure
> python
> implementation though.
> 
> -- 
> Cosmin Stejerean
> http://blog.offbytwo.com
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Chicago mailing list