[Tutor] Simple Question...

R. Alan Monroe amonroe at columbus.rr.com
Thu Oct 21 12:41:28 CEST 2004


> To do this without loading the file into memory, and without relying
> on wc (which ought to be very fast even with large files, if you need
> that), you could do:

> import random

> def linecount(f):
>     """count the number of lines in the file then rewind it"""
>     l = ' '
>     c = 0
>     while l:
>         l = f.readline()
>         if l[-1:] == '\n': c +=1    #I don't think the 'if' is necessary, 
>                                          #but for safety's sake we'll leave it
>     f.seek(0)
>     return c

> def getrandline(f):
>     """get a random line from f (assumes file pointer is at beginning)"""
>     lines = linecount(f)
>     r = random.randint(0,int(lines))
>     for i in range(1, r): f.readline()
>     return f.readline()

> anybody have a faster implementation?

I thought maybe it would be neat to try creating an index file if one
doesn't exist, and using it if it does exist. It would just be a
picked dictionary of byte offsets for each line in the file. Pick a
random entry from the dictionary and seek to that offset in the main
file. Haven't actually tried it though.

Alan



More information about the Tutor mailing list