[Tutor] CGI-version not working while similar command-line script is fine

Kent Johnson kent_johnson at skillsoft.com
Sun Sep 5 15:19:00 CEST 2004


At 02:44 AM 9/5/2004 +0100, Gerhard Venter wrote:
>I now think that the URL doesn't get passed to lynx - this seems to be a 
>shortcoming of the Windows shell or whatever is being used to expand the 
>os.system command.

This should work. My guess is you have an error in your script, maybe a 
problem with quoting or string substitution. Can you show us the script 
that is having trouble?

>I would like to do it natively in Python, without lynx.  I have expiernce 
>of urllib, but don't want raw html.  Is there a way I can get interpreted 
>html as the output?

I don't know of any Python package that will give you rendered HTML the way 
lynx does. There are several packages that will parse HTML and give easy 
access to the contents. They are designed for screen-scraping applications 
where you are accessing a specific part of the HTML but you could probably 
walk the tree they generate and pull out the text. The packages are
Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/
Scraper: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/286269
ElementTree Tidy: http://effbot.org/zone/element-tidylib.htm

One thing these add to the built-in HTML parsers is better tolerance for 
badly-formed HTML.

Kent



More information about the Tutor mailing list