Parsing/Crawler Questions..

John Nagle nagle at animats.com
Thu Mar 5 01:22:42 EST 2009


bruce wrote:
> hi phillip...
> 
> thanks for taking a sec to reply...
> 
> i'm solid on the test app i've created.. but as an example.. i have a parse
> for usc (southern cal) and it exrtacts the courselist/class schedule... my
> issue was that i realized the multiple runs of the app was giving differentt
> results... in my case, the class schedule isn't static.. (actually, none of
> the class/course lists need be static.. they could easily change).
> 
> so i don't have apriori knowledge of what the actual class/course list site
> would look like, unless i physically examined the site, each time i run the
> app...
> 
> i'm inclined to think i might need to run the parser a number of times
> within a given time frame, and then take a union/join of the output of the
> different runs.. this would in theory, give me a high probablity that i'd
> get 100% of the class list...

     I think I see the problem.  I took a look at the USC class list, and
it's been made "Web 2.0".  When you read the page, you don't get the
class list; you get a Javascript thing that builds a class list on
demand, using JSON, no less.

     See "http://web-app.usc.edu/soc/term_20091.html".

     I'm not sure how you're handling this.  The Javascript actually
has to be run before you get anything.

				John Nagle



More information about the Python-list mailing list