[Tutor] parse text for paragraghs/sections

Alan Gauld alan.gauld at btinternet.com
Mon Apr 20 19:31:01 CEST 2009


"Dinesh B Vadhia" <dineshbvadhia at hotmail.com> wrote in

> ... pickup all text between the tags <pre> and </pre> and replace with 
> another piece of text.

> How do you do this with re?

Preferably you don't, use an HTML parser. It is both easier and
more reliable since it is almost impossible to handle all HTML
constructs reliably using regex.

BeautifiulSoup is a good HTML parser although you can use the ones
that are in the standard library too.

Of course it may not be an HTML file in which case regex may or
may not be appropriate, but there are several other parsing modules
around, a quick search of the recent list archives will reveal several
discussions.

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/ 




More information about the Tutor mailing list