Parsing/Crawler Questions..

John Nagle nagle at
Thu Mar 5 07:22:42 CET 2009

bruce wrote:
> hi phillip...
> thanks for taking a sec to reply...
> i'm solid on the test app i've created.. but as an example.. i have a parse
> for usc (southern cal) and it exrtacts the courselist/class schedule... my
> issue was that i realized the multiple runs of the app was giving differentt
> results... in my case, the class schedule isn't static.. (actually, none of
> the class/course lists need be static.. they could easily change).
> so i don't have apriori knowledge of what the actual class/course list site
> would look like, unless i physically examined the site, each time i run the
> app...
> i'm inclined to think i might need to run the parser a number of times
> within a given time frame, and then take a union/join of the output of the
> different runs.. this would in theory, give me a high probablity that i'd
> get 100% of the class list...

     I think I see the problem.  I took a look at the USC class list, and
it's been made "Web 2.0".  When you read the page, you don't get the
class list; you get a Javascript thing that builds a class list on
demand, using JSON, no less.

     See "".

     I'm not sure how you're handling this.  The Javascript actually
has to be run before you get anything.

				John Nagle

More information about the Python-list mailing list