gandalf at the.dead.ISP.of.Community.net
Thu Jan 31 15:07:21 CET 2013
RichD <r_delaney2001 at yahoo.com> contributed wisdom to news:badd4188-196b-
45e3-ba8a-511d471282fa at nh8g2000pbc.googlegroups.com:
> On Jan 30, Gandalf Parker <gand... at the.dead.ISP.of.Community.net>
>> > Web gurus, what's going on?
>> That is the fault of the site itself.
>> If they are going to block access to users then they should also block
>> access to the automated spiders that hit the site to collect data.
> well yeah, but what's going on, under the hood?
> How does it get confused? How could this
> happen? I'm looking for some insight, regarding a
> hypothetical programmimg glitch -
You dont understand. It is not in the code. It is in the site.
It is as if someone comes and picks fruit off of your tree, and you are
questioning the tree for how it bears fruit.
The site creates web pages.
Google collects web pages.
The site needs to set things like robot.txt to tell Google to NOT collect
the pages in the archives. Which is not an absolute protection but at least
its an effort that works for most sites.
More information about the Python-list