[Tutor] Re: [Tutor]Reading and Writing (No Arithmetic?) [linecache?]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Fri, 18 Jan 2002 19:26:35 -0800 (PST)


On Fri, 18 Jan 2002, Paul Sidorsky wrote:

> > But, how could I get a random line if there are thousand of lines in the
> > data file without bogging down the computer?
> 
> I think there are a couple of libraries around for that kind of thing. 

If you want to use something precooked, the 'linecache' module should give
you random access to a test file:

    http://www.python.org/doc/current/lib/module-linecache.html

To get a random line from a file still requires that we count how many
lines are in a file, but that can be a quick linear scan.


> Another idea is to build an index of some kind.  A simple one would
> have the number of quotes first, followed by the offsets of each of
> the quotes.  Then you can read in the number of quotes, pick your
> random number, find its offset in the index, seek to it, and read to
> the newline.  This keeps the maintainability and doesn't use as much
> extra space, but of course any time you change the quotes file you
> have to remember to rebuild the index.

I think this is how 'linecache' works, but I'd have to check the code to
make sure about it... wait, nope, it actually tries to suck the whole
file.  Ugh, then 'linecache' is actually not quite so useful then!  
Hmmm...

I thought this sounded like an interesting thing to write, so I've cooked
up a 'RandomFile' class that gives random access to a file:

###
class RandomFile:
    """Allows random access in a file-like object.  Requires that we be
    able to seek() through it."""
    def __init__(self, fp):
        """Initializer.

        'fp' should be a file pointer that supports seek()."""
        self.fp = fp
        self._cache = []
        self.updatecache()

    def getline(self, n):
        """Returns line #n."""
        pos = self._cache[n]
        self.fp.seek(pos)
        return self.fp.readline()


    def countlines(self):
        """Returns the number of lines in our file."""
        return len(self._cache)

    
    def updatecache(self):
        """Scans through the file to find where every line starts."""
        self._cache = [0]
        self.fp.seek(0)
        n = 0
        while 1:
            ch = self.fp.read(1)
            if not ch: break
            if ch == '\n':
                self._cache.append(n+1)
            n = n + 1
###



Here's a totally random example of it in action:

###
>>> import RandomFile
>>> f = RandomFile.RandomFile(open("RandomFile.py"))
>>> f.getline(0)
'class RandomFile:\n'
>>> f.getline(0)
'class RandomFile:\n'
>>> f.getline(20)
'        return len(self._cache)\n'
>>> f.getline(19)
'        """Returns the number of lines in our file."""\n'
>>> f.getline(18)
'    def countlines(self):\n'
###

This class isn't battle tested --- I just cooked it, so it might still
need some simmering.  (If we put more effort into it, we can make it this
RandomFile look like a Python list by overriding a __getitem__().  
Hmmm...)  But I hope this may be useful for you.


Good luck!