import data.py using massive amounts of memory

Nick Craig-Wood nick at craig-wood.com
Wed Jun 27 12:30:06 CEST 2007


I've been dumping a database in a python code format (for use with
Python on S60 mobile phone actually) and I've noticed that it uses
absolutely tons of memory as compared to how much the data structure
actually needs once it is loaded in memory.

The programs below create a file (z.py) with a data structure in which
looks like this

-- z.py ----------------------------------------------------
z = {
  0 : (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
  1 : (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
  2 : (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21),
[snip]
  998 : (998, 999, 1000, 1001, 1002, ..., 1012, 1013, 1014, 1015, 1016, 1017),
  999 : (999, 1000, 1001, 1002, 1003, ..., 1013, 1014, 1015, 1016, 1017, 1018),
}
------------------------------------------------------------

Under python2.2-python2.4 "import z" uses 8 MB, whereas loading a
pickled dump of the file only takes 450kB.  This has been improved in
python2.5 so it only takes 2.2 MB.

    $ python2.5 memory_usage.py 
    Memory used to import is 2284 kB
    Total size of repr(z) is  105215
    Memory used to unpickle is 424 kB
    Total size of repr(z) is  105215

    $ python2.4 memory_usage.py 
    Memory used to import is 8360 kB
    Total size of repr(z) is  105215
    Memory used to unpickle is 456 kB
    Total size of repr(z) is  105215

    $ python2.3 memory_usage.py 
    Memory used to import is 8436 kB
    Total size of repr(z) is  105215
    Memory used to unpickle is 456 kB
    Total size of repr(z) is  105215

    $ python2.2 memory_usage.py 
    Memory used to import is 8568 kB
    Total size of repr(z) is  105215
    Memory used to unpickle is 392 kB
    Total size of repr(z) is  105215

    $ python2.1 memory_usage.py 
    Memory used to import is 10756 kB
    Total size of repr(z) is  105215
    Memory used to unpickle is 384 kB
    Total size of repr(z) is  105215

Why does it take so much memory?  Is it some consequence of the way
the datastructure is parsed?

Note that once it has made the .pyc file the subsequent runs take even
less memory than the cpickle import.

S60 python is version 2.2.1. It doesn't have pickle unfortunately, but
it does have marshal and the datastructures I need are marshal-able so
that provides a good solution to my actual problem.

Save the two programs below with the names given to demonstrate the
problem.  Note that these use some linux-isms to measure the memory
used by the current process which will need to be adapted if you don't
run it on linux!

-- memory_usage.py -----------------------------------------

import os
import sys
import re
from cPickle import dump

def memory():
    """Returns memory used (RSS) in kB"""
    status = open("/proc/self/status").read()
    match = re.search(r"(?m)^VmRSS:\s+(\d+)", status)
    memory = 0
    if match:
        memory = int(match.group(1))
    return memory

def write_file():
    """Write the file to be imported"""
    fd = open("z.py", "w")
    fd.write("z = {\n")
    for i in xrange(1000):
        fd.write("  %d : %r,\n" % (i, tuple(range(i,i+20))))
    fd.write("}\n")
    fd.close()

def main():
    write_file()
    before = memory()
    from z import z
    after = memory()
    print "Memory used to import is %s kB" % (after-before)
    print "Total size of repr(z) is ",len(repr(z))

    # Save a pickled copy for later
    dump(z, open("z.bin", "wb"))

    # Run the next part
    os.system("%s memory_usage1.py" % sys.executable)

if __name__ == "__main__":
    main()

-- memory_usage1.py ----------------------------------------

from memory_usage import memory
from cPickle import load

before = memory()
z = load(open("z.bin", "rb"))
after = memory()
print "Memory used to unpickle is %s kB" % (after-before)
print "Total size of repr(z) is ",len(repr(z))

------------------------------------------------------------

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list