memory usage

Nagy Gabor linux42 at freemail.c3.hu
Tue May 6 09:20:49 EDT 2003


I wrote a simple datafile parser, and it is quite memory hungry, and I
don't know if this is what I should expect, or is there a bug in my code.

I can profile the CPU time spent here and there, but I have no information
about the memory used meanwhile, nor about the time spent handling this
memory.

My input file is cleartext, with fixed width fields.

My first input to test with was about 18MiB long, about 115k lines.
I ran out of physical memory parsing this file, then I cancelled, as the
swapping took ages.
Then I trimmed the file to 15k lines, about 2,3MiB in size.
It took about 250MiB of virtual memory.
Is this normal? I hope I don't keep extra copies around to multiple the
memory needed.

OK, what do I want to keep in memory? Here is some of the code.

I have two classes, T, and TD.

def Parse():
  recordset = TD()
  recordset.Tag = T(Name = 'recordset', Flag=1)
  recordset.Data = []
  while 1:
    record = ParseRecord()
    recordset.append(record)

def ParseRecord():
  record = TD()
  record.Tag = T( Value='42', Name = 'record', Class='C', Flag=1)
  record.Data = ParseFields()
  return record

def ParseFields():
  fields = []
  for ...:
    Data = StringIO.read( length)
    tmp = TD()
    tmp.Tag = T(name = 'name')
    tmp.Data = Data
    fields.append(tmp)
  return fields

class T:
    def __init__(self, Value = '', Flag = 0, Name = '', Class = ''):
        self.Flag = Flag
	self.Value = Value
	self.Class = Class
	self.Name = Name

class TD:
    def __init__(self):
        self.Tag = T()
	self.Data = None

That's all. Storing the whole text file in the Data attributes of TDs is
OK. But what is the remaining 247MiB?
T.Value is two characters,
T.Name is about 5-25 characters
T.Class is always 'C'

Parse was called once, took 190s cumtime
ParseRecord was called 15000 times, 186s cumtime
ParseFields was called 14999 times, 176s cumtime
T.__init__ was called 599901 times, 16.5s cumtime
TD.__init__ was called 599901 times, 76.8s cumtime

I don't understand why TD.__init__ took 76.8s, (doing almost nothing), and
I don't understand what my memory was used for.

Can someone please explain what goes on, when I pass around objects, etc.
What is the (memory) overhead of having a list, an instance of a class, a
string, etc.

Regards, Gee





More information about the Python-list mailing list