[OT] Free software versus software idea patents

Chris Angelico rosuav at gmail.com
Thu Apr 14 05:15:05 EDT 2011

On Thu, Apr 14, 2011 at 4:04 PM, harrismh777 <harrismh777 at charter.net> wrote:
>    How many web crawlers have you built? Are there any web programmers out
> there who need a web bot to hit multiple sites zillions of times a month
> from different places on earth to 'up' the number of hits for economic
> reasons? I've seen my share of this.

A well-behaved spider will (a) have a UA that identifies itself (as a
bot, and preferably as itself - eg "GoogleBot", etc - some even go so
far as to include a URL for more info), and (b) start by fetching
/robots.txt before they go any further. Servers can recognize
properly-built crawlers. And improperly-built crawlers, deliberately
trying to hammer a server to lie about browser stats? Seriously, do
you think people actually care THAT much?

>    How mamy times have you altered the identity of your web browser so that
> the web site would 'work'? You know, stupid messages from the server that
> say, "We only support IE 6+, upgrade your browser...",  so you tell it
> you're using IE 6 and, well no problem.

Yep. Which means that the figures will always be skewed toward IE a
bit. But it's a lot less than you might think; most people don't leave
UA switchers active all the time, and the number of web sites that
require them is dropping. It's true that UA switching will tip toward
IE (I've never seen a site where you have to pretend to be Google
Chrome), but the epidemology is, I believe, not all that high.

>    Web site data is bogus. It assumes even distributions... it assumes even
> usage of the site from all surfers, it assumes no web crawlers and no bots,
> it assumes no browser identity tampering, and it assumes that there aren't
> those who for economic reasons are not inflating the numbers deliberately
> (no, really??) from world-owned bot farms.

Even distributions of what?

1) Assuming nothing, it merely gives data. About one site. That's why
overall "browser marketshare" stats have to be done by averaging
multiple sites.

2) Web crawlers - see above. If you've ever looked at AWStats or
Webalizer or *insert stats engine here*, you'll have seen that it will
identify them. AWStats goes a bit further and will identify "viewed
traffic" and "not viewed traffic" even if it's unable to identify the
specific bot.

3) Yes, it assumes no UA switchers, obviously. It's just based on
headers. But I reckon you could easily identify someone who's using a
switcher, based on other headers - for instance, I doubt very much
that IE6 will send "Accept-Encoding: gzip,deflate,sdch" (which my
Chrome does).

4) Assumes people aren't deliberately fiddling the figures. Yeah, that
would be correct. We're in the realm of conspiracy theories here...
does anyone seriously think that browser stats are THAT important that
they'd go to multiple web servers with deceitful hits? Not forgetting
that they'd have to mix up the IPs, make plausible "browsing sessions"
(with referers and image retrieval and so on), vary the date/times,
etc, etc, etc, etc... and generate enough hits to make a reasonable
dent in the figures.

>    There is no reliable way to measure free software usage. But, there sure
> is a lot of posturing going on in the market place ...  wonder why?

Sure, and there's no reliable way to measure non-free software usage
either. What's the difference? You could count sales of Microsoft
Office, and you could count downloads of Open Office. Neither is any
more accurate than the other; although I think the 24-hour figures for
Firefox 4 / IE 9 downloads are fairly indicative, since people can't
get them off their respective OS install CDs. And this isn't
restricted to electronica either. Which is more popular, Coca-Cola or
Pepsi? Do more people vote Liberal or Labour, Republican or Democrat,
Whig or Tory?

Statisticking is a huge science. Most of it involves figuring out
what's important - anyone can get data, but getting useful information
out of the data takes some work.

Chris Angelico

More information about the Python-list mailing list