Parsing

Mike C. Fletcher mcfletch at rogers.com
Wed Jul 3 20:42:54 EDT 2002


For your task, look at:

	re (low-level text processing)
	htmllib or sgmllib (actually parses HTML/SGML, lets you define callbacks 
to handle content)
	xml (same basic idea as previous, but more committed :) ).

in the standard Python library or search Google's groups for "strip 
HTML" in the Python newsgroup to find examples using those libraries 
that get posted every few months or so :) .  For example:

	http://groups.google.com/groups?th=fbebe304ebf2c36e&rnum=1


For actual generalised parsing solutions (which are a pretty big hammer 
for such a simple task), Google about for:

	PLY
	SPARK
	SimpleParse
	PyLR
	Yapps
	kwParsing
	Plex

or even just search for "Python parsing".

Have fun,
Mike


Thomas Berglund wrote:
> Hello all
> 
> I'm new to this group and new to python =).
> 
> I was thinking, as a first project, to have a program that would
> download a random quote from a homepage that gives such, parse all the
> html out of it and print it. Should be simple, no?
...
> but, that's not really working out. I was just wondering if there was
> some kind of standard parsing library for python to help me get rid of
> those nasty html tags. Any pointers would me much appreciated.
> 
> Have a nice day, and thanks.
> 
> /Thomas






More information about the Python-list mailing list