[Tutor] Simple Question...
Bill Mill
bill.mill at gmail.com
Sat Oct 16 23:30:54 CEST 2004
To do this without loading the file into memory, and without relying
on wc (which ought to be very fast even with large files, if you need
that), you could do:
import random
def linecount(f):
"""count the number of lines in the file then rewind it"""
l = ' '
c = 0
while l:
l = f.readline()
if l[-1:] == '\n': c +=1 #I don't think the 'if' is necessary,
#but for safety's sake we'll leave it
f.seek(0)
return c
def getrandline(f):
"""get a random line from f (assumes file pointer is at beginning)"""
lines = linecount(f)
r = random.randint(0,int(lines))
for i in range(1, r): f.readline()
return f.readline()
anybody have a faster implementation?
Peace
Bill Mill
bill.mill at gmail.com
On Sat, 16 Oct 2004 16:52:16 -0400, R. Alan Monroe
<amonroe at columbus.rr.com> wrote:
> > This will work perfectly if your file is small enough to fit in your
> > computer's memory. If you want a function that does this on large
> > files, you'll have to use something in those lines:
>
> > import random
>
> > def randomLineFromBigFile(fileName, numLines):
> > whatLine = random.randint(1, numLines) # choose a random line number
> > source = open(fileName, 'r')
> > i = 0
> > for line in source:
> > i += 1
> > if i == whatLine: return line
> > return None
>
> > This function uses very little (and a constant amount of) memory. The
> > downside is that you have to know the total number of lines in the file
> > (that's the numLines argument) before calling it. It's not a very hard
> > thing to do.
>
> Wouldn't it be much quicker to do something like this?
>
> import os.path
> import random
>
> size = os.path.getsize('test.txt')
> print size
>
> randline = random.randint(1, size)
> print randline
>
> testfile = open('test.txt', 'r')
> testfile.seek(randline)
> print testfile.readline() #read what is likely half a line
> print testfile.readline() #read the next whole line
> testfile.close()
>
> You'd just need to add some exception handling in the event you tried
> to read off the end of the file.
>
> Alan
>
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list