What's the best way to write this regular expression?

Ian Kelly ian.g.kelly at gmail.com
Tue Mar 6 18:35:32 EST 2012


On Tue, Mar 6, 2012 at 4:05 PM, John Salerno <johnjsal at gmail.com> wrote:
>> Anything that allows me NOT to use REs is welcome news, so I look forward to learning about something new! :)
>
> I should ask though...are there alternatives already bundled with Python that I could use? Now that you mention it, I remember something called HTMLParser (or something like that) and I have no idea why I never looked into that before I messed with REs.

HTMLParser is pretty basic, although it may be sufficient for your
needs.  It just converts an html document into a stream of start tags,
end tags, and text, with no guarantee that the tags will actually
correspond in any meaningful way.  lxml can be used to output an
actual hierarchical structure that may be easier to manipulate and
extract data from.

Cheers,
Ian



More information about the Python-list mailing list