[python-advocacy] Python makes the "most wanted list"

Roy Smith roy at panix.com
Mon Feb 11 22:11:21 CET 2008


> I'm having a hard time understanding what is going on here.
>
> Do we have a library which for no reason just calls up the WC3 site
> and pesters them?
>
> or are all those requests because somebody really wants to talk to
> WC3, on purpose, but now WC3 is miffed because it cannot find out who
> is accessing its site?

I think what's going on is:

Somebody (perhaps plural) wrote some web application(s) using the python
urllib which parses this stuff at the beginning of most HTML pages:

!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

and then does a (pointless) GET on "http://www.w3.org/TR/html4/loose.dtd".
 And they left the version string at the default value.  So, the W3C folks
see bazillions of gets of that file from their servers, with the python
urllib logged as the client.

It's not really clear there's anything we can do about this.  It's really
not our fault if people are using python to write brain-dead applications.
 It's just annoying that we show up in the logs associated with them.





More information about the Advocacy mailing list