scraping a tumblr.com archive page

Jabba Laci jabba.laci at gmail.com
Sun Nov 20 21:09:46 EST 2011


Hi,

Thanks for the answer. Finally I found an API for this task:
http://www.tumblr.com/docs/en/api/v2#posts . It returns the required
data in JSON format.

Laszlo

> The page isn't really that dynamic- HTTP doesn't allow for that.
> Scrolling down the page triggers some Javascript. That Javascript
> sends some HTTP requests to the server, which returns more HTML, which
> gets stuck into the middle of the page. If you take the time to
> monitor your network traffic using a tool like Firebug, you should be
> able to figure out the pattern in the requests for more content. Just
> send those same requests yourself and parse the results.



More information about the Python-list mailing list