[python-advocacy] Python makes the "most wanted list"

Tal Einat taleinat at gmail.com
Tue Feb 12 00:38:27 CET 2008


Doug Hellmann wrote:
>
> I agree that adding documentation to encourage the use of a User-Agent
> header won't make much difference, and that forcing the developer to
> provide a "useful" value is going too far.  The source of the hits on
> the W3C site is bone-headed scripts that use regexes or other
> incorrect methods for finding links in HTML.  Neither of these
> solution addresses this problem anyway, because the problem isn't that
> we can't find the author of the bone-headed scripts, the problem is
> that the scripts are out there in the first place.
>
> There's not a lot we can do about existing programs, but adding a
> function to the standard library to properly parse out the links would
> solve the problem for new code, since most developers would use it
> rather than writing their own.

Ahh, if only it were so simple! But web content, especially HTML, is
such a mess that making a "standard function" which will Just Work is
nearly impossible. It's hard enough that there isn't any such code to
be readily found, AFAIK.

BeautifulSoup does a pretty good job at this. It's not yet mature,
especially since it is mainly developed and maintained by just one
person. But if an effort were made to clean up the code, thoroughly
test it and improve it's documentation, it certainly would be an
awesome tool in the Python developer's workbench. (IMO it already is
great, but isn't quite ready for mass usage because of lack of
testing, lacking documentation, and hard-to-read code.)

But as Paul's post just mentioned, this might be off-topic...

- Tal


More information about the Advocacy mailing list