High memory usage - program mistake or Python feature?

Ben S bens at replytothegroupplease.com
Fri May 23 08:27:46 EDT 2003


I wrote a little CGI script that reads in a file like so:

def LoadLogFile(filename):
    """Loads a log file as a collection of lines"""
    try:
        logFile = file(filename, 'rU')
        lines = map(string.strip, logFile.readlines())
    except IOError:
        return False
    return lines

Then it processes it with this function a few times:

def GetLinesContainingCommand(lines, commandName):
    """Find all the lines containing that command in the logs"""
    pattern = re.compile(" Log \w+: " + commandName + " ")
    return [eachLine for eachLine in lines if pattern.search(eachLine)]

The 'problem' was that, when operating on a 50MB file, the memory usage
(according to ps on Linux) rocketed to just over 150MB. Since there's no
other significant storage in the script, I can only assume that the
lines (corresponding to strings of between 40 and 90 ASCII characters)
are being stored in such a way that their size is inflated to 3x their
usual size. I've not specified any Unicode usage anywhere, nor does the
text file in question use any characters above 127, as far as I know.
The GetLinesContainingCommand function returns a tiny subset (no more
than 20 or 30 lines out of tens of thousands) so I doubt it's that
causing the problem.

So I guess my question is whether I've coded this inefficiently in terms
of memory usage, or whether this type of overhead has to be expected?
I'm pretty new to Python so the former sounds likely. Luckily I will
rarely be operating on 50MB files, but I'm interested in knowing for any
future scripts I write.

--
Ben Sizer
http://pages.eidosnet.co.uk/kylotan






More information about the Python-list mailing list