Beginner Q. interrogate html object OR file search?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Thu Dec 3 02:33:02 EST 2009


On Wed, 02 Dec 2009 19:24:07 -0800, Mark G wrote:

> Hi all,
> 
> I am new to python and don't yet know the libraries well. What would be
> the best way to approach this problem: I have a html file parsing script
> - the file sits on my harddrive. I want to extract the date modified
> from the meta-data. Should I read through lines of the file doing a
> string.find to look for the character patterns of the meta- tag, 

That will probably be the fastest, simplest, and easiest to develop. But 
the downside is that it will be subject to false positives, if some tag 
happens to include text which by chance looks like your meta-data. So, 
strictly speaking, this approach is incorrect.

> or
> should I use a DOM type library to retrieve the html element I want?
> Which is best practice? 

"Best practice" would imply DOM.

As for which you use, you need to weigh up the risks of a false positive 
versus the convenience and speed of string matching versus the 
correctness of a DOM approach.


> which occupies least code?

Unless you're writing for an embedded system, or if the difference is 
vast (e.g. 300 lines versus 30) that's premature optimization.

Personally, I'd use string matching or a regex, and feel guilty about it.



-- 
Steven



More information about the Python-list mailing list