Random web page?

Anton Vredegoor anton at vredegoor.doge.nl
Thu Apr 17 19:50:24 EDT 2003


Terry Hancock <hancock at anansispaceworks.com> wrote:

<snip>

>Perhaps one can query the DNS registry in some way that would be more 
>equal?  Can RARP be used to look up *all* domains associated with an IP, or 
>only one canonical value?  If you could look up all of them, you could then 
>select one at random -- but this will not generate evenly weighted results 
>of course.

But why do the DNS-lookup if it's possible to load a page using an ip?
In fact the reason I asked this question here is that I *have* a
script (partly based on some code I found in clp) that generates
random ip's, then looks up their names using a DNS-server. The problem
I have with this script is that it necessarily blasts my providers DNS
server since a lot of addresses need to be looked up, most of which
will result in an error, and the remaining successfully resolved
addresses very seldomly have a HTTP page.

If I want to pursue the matter any further there's probably no other
choice than to follow Peters advice, but it feels like some kind of
strange contortion of a portscan (which is generally frowned upon)
where the port is held constant and the ip-address is varied. I would
welcome some advice about the level of politeness such actions are
generally associated with since I would not like to evoke unfriendly
reactions.

<snip>

>Of course, the distribution is probably not too relevant -- there are so 
>many web pages in the world now, that you are unlikely to retrieve the same 
>one twice with any truly random system.  (Not true on Google, where 
>connectivity highly biases the search).

Well, to me it is. I could easily generate lots of legitimate reasons
for wanting an even distribution (for example  statistical analysis of
net content and such) but that would be lying since reality is, I just
want such a tool for my personal pleasure to play with :-)

Anton.

PS. here's my unsatisfying DNS-code (use at your own risk, I am not
sure providers would like this script being executed regularly) The
threading code seems to be necessary because the script would be very
slow without it. I haven't got a lot of experience with threading so I
can't get rid of the global variables yet. I would welcome
improvements for this code as well as a script to generate a random
webpage.

import threading, socket
import time
import random

def rAddress():
    return '.'.join([str(random.randint(0,255)) for i in range(4)])

def LookupThread():
  while 1:
    try:
        ip = ipList.pop()
        try:
            ipAndHostList.append((ip, socket.gethostbyaddr(ip)[0]))
        except socket.herror:
            pass
    except IndexError:
      break

ipList = [rAddress() for i in range(1000)]
ipAndHostList = []

for i in range(500):
  threading.Thread(target=LookupThread).start()

while ipList:
  time.sleep(0.1)

fn = 't4.txt'
f = file(fn,'a+')

for x in ipAndHostList:
    print x
    f.write('%s %s\n' %x)
f.close()








More information about the Python-list mailing list