What's the best way to write this regular expression?

John Salerno johnjsal at gmail.com
Wed Mar 7 00:39:42 CET 2012


Thanks. I'm thinking the choice might be between lxml and Beautiful
Soup, but since BS uses lxml as a parser, I'm trying to figure out the
difference between them. I don't necessarily need the simplest
(html.parser), but I want to choose one that is simple enough yet
powerful enough that I won't have to learn another method later.




On Tue, Mar 6, 2012 at 5:35 PM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> On Tue, Mar 6, 2012 at 4:05 PM, John Salerno <johnjsal at gmail.com> wrote:
>>> Anything that allows me NOT to use REs is welcome news, so I look forward to learning about something new! :)
>>
>> I should ask though...are there alternatives already bundled with Python that I could use? Now that you mention it, I remember something called HTMLParser (or something like that) and I have no idea why I never looked into that before I messed with REs.
>
> HTMLParser is pretty basic, although it may be sufficient for your
> needs.  It just converts an html document into a stream of start tags,
> end tags, and text, with no guarantee that the tags will actually
> correspond in any meaningful way.  lxml can be used to output an
> actual hierarchical structure that may be easier to manipulate and
> extract data from.
>
> Cheers,
> Ian



More information about the Python-list mailing list