Beginner Q. interrogate html object OR file search?
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Thu Dec 3 02:33:02 EST 2009
On Wed, 02 Dec 2009 19:24:07 -0800, Mark G wrote:
> Hi all,
>
> I am new to python and don't yet know the libraries well. What would be
> the best way to approach this problem: I have a html file parsing script
> - the file sits on my harddrive. I want to extract the date modified
> from the meta-data. Should I read through lines of the file doing a
> string.find to look for the character patterns of the meta- tag,
That will probably be the fastest, simplest, and easiest to develop. But
the downside is that it will be subject to false positives, if some tag
happens to include text which by chance looks like your meta-data. So,
strictly speaking, this approach is incorrect.
> or
> should I use a DOM type library to retrieve the html element I want?
> Which is best practice?
"Best practice" would imply DOM.
As for which you use, you need to weigh up the risks of a false positive
versus the convenience and speed of string matching versus the
correctness of a DOM approach.
> which occupies least code?
Unless you're writing for an embedded system, or if the difference is
vast (e.g. 300 lines versus 30) that's premature optimization.
Personally, I'd use string matching or a regex, and feel guilty about it.
--
Steven
More information about the Python-list
mailing list