[Tutor] update html pages using python

Stefan Behnel stefan_ml at behnel.de
Tue Sep 1 08:38:34 CEST 2009


Alan Gauld wrote:
> "Stefan Behnel" <stefan_ml at behnel.de> wrote
>>> "pedro" <pedrooconnell at gmail.com> wrote
>>>> Hi, I was wondering if anyone could point me in the right direction as
>>>> far as the best way to use python to update html. 
>>>
>>> There are a number of modules in the standard library that can help but
>>> the best known module for this is BeautifulSoup
>>
>> I would call that statement highly exaggerated.
>>
>> http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/
> 
> There may be a language thing at work here but by "best known module"
> I do not mean Beautiful Soup is the best of all known modules, rather it
> is the module which is most widely known of the non standard HTML
> packages.

I think "non standard HTML package" pretty much hits the nail on the head.


> It is also, arguably, one of the easiest to use
> and well behaved with non compliant html

That, again, is questionable. The task at hand was to "update HTML pages",
in which case it is quite useful to have them fixed up into standard
compliant HTML before working on them. BeautifulSoup will not do that for
you. Instead, it will leave you with whatever tag soup you had at the
beginning, so that you will end up sending out broken HTML again. I
wouldn't call that "well behaved" at all.

Stefan



More information about the Tutor mailing list