python and web pages

Gerhard Häring gh at ghaering.de
Thu Nov 19 17:19:14 CET 2009


Daniel Dalton wrote:
> Hi,
> 
> Here is my situation:
> I'm using the command line, as in, I'm not starting gnome or kde (I'm on
> linux.)
> I have a string of text attached to a variable,. So I need to use one of
> the browsers on linux, that run under the command line, eg. lynx,
> elinks, links, links2 and do the following.

No, you don't need any text mode browser. It's much easier to do with
modules from the Python standard library.

> 1. Open a certain web page and find the first text box on the page, and
> put this text into the form.

I assume you know HTML and HTTP.

You can use urllib or urllib2 to submit the form.

> 2. Press the submit button, and wait for the result page to load.
> 3. Click on the 15th link down the page.

You will then get the returned HTML from a function and you will have to
parse it for the link you're interested in. There are many approaches to
get there. Manual parsing, regular expressions or one of the XML parsers
in the standard library (etree is the easiest).

> So, how do I do this, can anyone point me to some docs or modules that
> may help out here?

While all of this may seem overwhelming at first, I guess the solution
can be written in 20 - 30 lines of code.

-- Gerhard




More information about the Python-list mailing list