[Chicago] Parsing Metra's Online Schedule

Massimo Di Pierro mdipierro at cs.depaul.edu
Sat Apr 12 00:18:33 CEST 2008


I just used text.find('<pre><h4>') which is invalid html but it in  
the metra pages.

Massimo


On Apr 11, 2008, at 5:02 PM, Feihong Hsu wrote:

> I just want to clarify that the lxml dependency is a rather weak one.
> All of the schedule data is contained within <PRE> tags, with minimal
> markup. So the majority of the actual parsing uses regular
> expressions. I started the parser in lxml because I thought I was
> going to have to use a ton of XPath, which turned out not to be the
> case.
>
> But yeah, Metra hardly ever changes their schedules.
>
> --- Cosmin Stejerean <cstejerean at gmail.com> wrote:
>
>>>>
>>>> P.S. You need lxml to run the code.
>>>>
>>>
>>> If  there is any intention of running this on App Engine, you
>> won't be able
>>> to do that.  One thing that people are a little disappointed
>> about App
>>> Engine is your python libraries have to be pure python and can't
>> have C
>>> extensions, so this might fail (unless you get lucky and lxml is
>> installed
>>> on the linux distribution that they're using for the App Engine
>> servers, but
>>> my guess is probably not).
>>>
>>> Fortunately this looks like it'd be easy to convert to use
>> BeautifulSoup,
>>> which is pure Python.
>>>
>>
>> Metra updates their schedule once a year, if that. We don't need to
>> have the code on App Engine parse the metra data live. We just need
>> to
>> parse it and load it into the app. And keep the scripts around so
>> we
>> can parse it again when it changes. Good point about the pure
>> python
>> implementation though.
>>
>> --
>> Cosmin Stejerean
>> http://blog.offbytwo.com
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago



More information about the Chicago mailing list