[Tutor] Simple Question...
R. Alan Monroe
amonroe at columbus.rr.com
Thu Oct 21 12:41:28 CEST 2004
> To do this without loading the file into memory, and without relying
> on wc (which ought to be very fast even with large files, if you need
> that), you could do:
> import random
> def linecount(f):
> """count the number of lines in the file then rewind it"""
> l = ' '
> c = 0
> while l:
> l = f.readline()
> if l[-1:] == '\n': c +=1 #I don't think the 'if' is necessary,
> #but for safety's sake we'll leave it
> f.seek(0)
> return c
> def getrandline(f):
> """get a random line from f (assumes file pointer is at beginning)"""
> lines = linecount(f)
> r = random.randint(0,int(lines))
> for i in range(1, r): f.readline()
> return f.readline()
> anybody have a faster implementation?
I thought maybe it would be neat to try creating an index file if one
doesn't exist, and using it if it does exist. It would just be a
picked dictionary of byte offsets for each line in the file. Pick a
random entry from the dictionary and seek to that offset in the main
file. Haven't actually tried it though.
Alan
More information about the Tutor
mailing list