Beginner Q. interrogate html object OR file search?

Stephen Hansen apt.shansen at gmail.com
Thu Dec 3 00:35:59 EST 2009


On Wed, Dec 2, 2009 at 7:24 PM, Mark G <markgrahamnz at gmail.com> wrote:

> Hi all,
>
> I am new to python and don't yet know the libraries well. What would
> be the best way to approach this problem: I have a html file parsing
> script - the file sits on my harddrive. I want to extract the date
> modified from the meta-data. Should I read through lines of the file
> doing a string.find to look for the character patterns of the meta-
> tag, or should I use a DOM type library to retrieve the html element I
> want? Which is best practice? which occupies least code?
>
>
You can probably do some string.find's and it might work almost always, HTML
is funky and quite often coded badly by bad people.

And I would never personally suggest anyone go anywhere near a DOM library,
their life will never be happy again :)

I'd get lxml -- even though you're not directly using xml. It has a html
package in it too, its fast and astoundingly easy to use and fantastically
featureful for future growth :)

http://codespeak.net/lxml/lxmlhtml.html

--S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091202/a87cc6cd/attachment.html>


More information about the Python-list mailing list