What's the best way to write this regular expression?
johnjsal at gmail.com
Wed Mar 7 08:02:51 CET 2012
After a bit of reading, I've decided to use Beautiful Soup 4, with
lxml as the parser. I considered simply using lxml to do all the work,
but I just got lost in the documentation and tutorials. I couldn't
find a clear explanation of how to parse an HTML file and then
navigate its structure.
The Beautiful Soup 4 documentation was very clear, and BS4 itself is
so simple and Pythonic. And best of all, since version 4 no longer
does the parsing itself, you can choose your own parser, and it works
with lxml, so I'll still be using lxml, but with a nice, clean overlay
for navigating the tree structure.
Thanks for the advice!
More information about the Python-list