Random web page?
Anton Vredegoor
anton at vredegoor.doge.nl
Thu Apr 17 19:50:24 EDT 2003
Terry Hancock <hancock at anansispaceworks.com> wrote:
<snip>
>Perhaps one can query the DNS registry in some way that would be more
>equal? Can RARP be used to look up *all* domains associated with an IP, or
>only one canonical value? If you could look up all of them, you could then
>select one at random -- but this will not generate evenly weighted results
>of course.
But why do the DNS-lookup if it's possible to load a page using an ip?
In fact the reason I asked this question here is that I *have* a
script (partly based on some code I found in clp) that generates
random ip's, then looks up their names using a DNS-server. The problem
I have with this script is that it necessarily blasts my providers DNS
server since a lot of addresses need to be looked up, most of which
will result in an error, and the remaining successfully resolved
addresses very seldomly have a HTTP page.
If I want to pursue the matter any further there's probably no other
choice than to follow Peters advice, but it feels like some kind of
strange contortion of a portscan (which is generally frowned upon)
where the port is held constant and the ip-address is varied. I would
welcome some advice about the level of politeness such actions are
generally associated with since I would not like to evoke unfriendly
reactions.
<snip>
>Of course, the distribution is probably not too relevant -- there are so
>many web pages in the world now, that you are unlikely to retrieve the same
>one twice with any truly random system. (Not true on Google, where
>connectivity highly biases the search).
Well, to me it is. I could easily generate lots of legitimate reasons
for wanting an even distribution (for example statistical analysis of
net content and such) but that would be lying since reality is, I just
want such a tool for my personal pleasure to play with :-)
Anton.
PS. here's my unsatisfying DNS-code (use at your own risk, I am not
sure providers would like this script being executed regularly) The
threading code seems to be necessary because the script would be very
slow without it. I haven't got a lot of experience with threading so I
can't get rid of the global variables yet. I would welcome
improvements for this code as well as a script to generate a random
webpage.
import threading, socket
import time
import random
def rAddress():
return '.'.join([str(random.randint(0,255)) for i in range(4)])
def LookupThread():
while 1:
try:
ip = ipList.pop()
try:
ipAndHostList.append((ip, socket.gethostbyaddr(ip)[0]))
except socket.herror:
pass
except IndexError:
break
ipList = [rAddress() for i in range(1000)]
ipAndHostList = []
for i in range(500):
threading.Thread(target=LookupThread).start()
while ipList:
time.sleep(0.1)
fn = 't4.txt'
f = file(fn,'a+')
for x in ipAndHostList:
print x
f.write('%s %s\n' %x)
f.close()
More information about the Python-list
mailing list