[Tutor] CGI-version not working while similar
command-line script is fine
Kent Johnson
kent_johnson at skillsoft.com
Sun Sep 5 15:19:00 CEST 2004
At 02:44 AM 9/5/2004 +0100, Gerhard Venter wrote:
>I now think that the URL doesn't get passed to lynx - this seems to be a
>shortcoming of the Windows shell or whatever is being used to expand the
>os.system command.
This should work. My guess is you have an error in your script, maybe a
problem with quoting or string substitution. Can you show us the script
that is having trouble?
>I would like to do it natively in Python, without lynx. I have expiernce
>of urllib, but don't want raw html. Is there a way I can get interpreted
>html as the output?
I don't know of any Python package that will give you rendered HTML the way
lynx does. There are several packages that will parse HTML and give easy
access to the contents. They are designed for screen-scraping applications
where you are accessing a specific part of the HTML but you could probably
walk the tree they generate and pull out the text. The packages are
Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/
Scraper: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/286269
ElementTree Tidy: http://effbot.org/zone/element-tidylib.htm
One thing these add to the built-in HTML parsers is better tolerance for
badly-formed HTML.
Kent
More information about the Tutor
mailing list