html parsing? Or just simple regex'ing?

Thu Nov 11 15:05:27 EST 2004

In article <pan.2004.11.10.01.37.41.879705 at dcs.nac.uci.edu>,
 Dan Stromberg <strombrg at dcs.nac.uci.edu> wrote:

> I'm working on writing a program that will synchronize one database with
> another.  For the source database, we can just use the python sybase API;
> that's nice and normal.
> 
> [...]
> 
> 1) Would I be better off just regex'ing the html I'm getting back?  (I
> suppose this depends on the complexity of the html received, eh?)
> 
> 2) Would I be better off feeding the HTML into an HTML parser, and then
> traversing that datastructure (is that really how it works?)?

I recommend you look at BeautifulSoup:

  http://www.crummy.com/software/BeautifulSoup/

It is very forgiving of the typical affronts HTML writers put into their 
code.

-M

-- 
Michael J. Fromberger             | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/  | Dartmouth College, Hanover, NH, USA