security quirk

Gandalf Parker gandalf at the.dead.ISP.of.Community.net
Thu Jan 31 15:07:21 CET 2013


RichD <r_delaney2001 at yahoo.com> contributed wisdom to  news:badd4188-196b-
45e3-ba8a-511d471282fa at nh8g2000pbc.googlegroups.com:

> On Jan 30, Gandalf  Parker <gand... at the.dead.ISP.of.Community.net>
> wrote:
>> > Web gurus, what's going on?
>>
>> That is the fault of the site itself.
>> If they are going to block access to users then they should also block
>> access to the automated spiders that hit the site to collect data.
> 
> well yeah, but what's going on, under the hood?
> How does it get confused?  How could this
> happen?  I'm looking for some insight, regarding a
> hypothetical programmimg glitch -

(from alt.hacker)

You dont understand. It is not in the code. It is in the site.
It is as if someone comes and picks fruit off of your tree, and you are 
questioning the tree for how it bears fruit. 

The site creates web pages. 
Google collects web pages.
The site needs to set things like robot.txt to tell Google to NOT collect 
the pages in the archives. Which is not an absolute protection but at least 
its an effort that works for most sites.



More information about the Python-list mailing list