On Wed, Dec 2, 2009 at 7:24 PM, Mark G <span dir="ltr"><<a href="mailto:markgrahamnz@gmail.com">markgrahamnz@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


Hi all,<br>

<br>

I am new to python and don't yet know the libraries well. What would<br>

be the best way to approach this problem: I have a html file parsing<br>

script - the file sits on my harddrive. I want to extract the date<br>

modified from the meta-data. Should I read through lines of the file<br>

doing a string.find to look for the character patterns of the meta-<br>

tag, or should I use a DOM type library to retrieve the html element I<br>

want? Which is best practice? which occupies least code?<br>

<br></blockquote><div><br></div><div>You can probably do some string.find's and it might work almost always, HTML is funky and quite often coded badly by bad people.</div><div><br></div><div>And I would never personally suggest anyone go anywhere near a DOM library, their life will never be happy again :)</div>


<div><br></div><div>I'd get lxml -- even though you're not directly using xml. It has a html package in it too, its fast and astoundingly easy to use and fantastically featureful for future growth :)</div><div><br>


</div><div><a href="http://codespeak.net/lxml/lxmlhtml.html">http://codespeak.net/lxml/lxmlhtml.html</a></div><div><br></div><div>--S</div></div>