What's the best way to write this regular expression?

John Salerno johnjsal at gmail.com
Wed Mar 7 02:02:51 EST 2012


After a bit of reading, I've decided to use Beautiful Soup 4, with
lxml as the parser. I considered simply using lxml to do all the work,
but I just got lost in the documentation and tutorials. I couldn't
find a clear explanation of how to parse an HTML file and then
navigate its structure.

The Beautiful Soup 4 documentation was very clear, and BS4 itself is
so simple and Pythonic. And best of all, since version 4 no longer
does the parsing itself, you can choose your own parser, and it works
with lxml, so I'll still be using lxml, but with a nice, clean overlay
for navigating the tree structure.

Thanks for the advice!



More information about the Python-list mailing list