High memory usage - program mistake or Python feature?
Jack Diederich
jack at performancedrivers.com
Fri May 23 08:50:16 EDT 2003
On Fri, May 23, 2003 at 01:27:46PM +0100, Ben S wrote:
> I wrote a little CGI script that reads in a file like so:
>
> def LoadLogFile(filename):
> """Loads a log file as a collection of lines"""
> try:
> logFile = file(filename, 'rU')
> lines = map(string.strip, logFile.readlines())
> except IOError:
> return False
> return lines
>
> Then it processes it with this function a few times:
>
> def GetLinesContainingCommand(lines, commandName):
> """Find all the lines containing that command in the logs"""
> pattern = re.compile(" Log \w+: " + commandName + " ")
> return [eachLine for eachLine in lines if pattern.search(eachLine)]
>
> The 'problem' was that, when operating on a 50MB file, the memory usage
> (according to ps on Linux) rocketed to just over 150MB. Since there's no
Well, you are definitely keeping at least one copy of the whole file around,
plus a little per-line overhead.
You could check the lines as you read them, and only store a copy of the
ones you want to keep
def do_everything(filename):
try:
fob = open('filename', 'rU')
except IOError:
return False
cmds = ('ssh', 'adduser', 'top', 'whatever') # commands we care about
# { 'command' : compiled_re }
cmd_res = dict(zip(cmds, map(re.compile, cmds)))
# { 'command' : [list of lines that match] }
cmds_matched = dict(zip(cmds, [[] for x in cmds])) # as obfu as python gets
for (line) in fob.readlines():
for (cmd) in cmds:
if (cmd_res[cmd].search(line)):
cmds_matched[cmd].append(line)
break # from your example, no two commands can match the same line
return cmds_matched
-jackdied
More information about the Python-list
mailing list