memory usage

Tim Peters tim at zope.com
Sat Dec 6 22:54:30 EST 2003


[Dave]
>> I'm writing code which deals with lists of millions to tens of
>> millions of elements.  It seems to me that python gobbles up memory
>> faster than it should in these cases.
>>
>> For example, I start the python interpreter and then execute the
>> following code:
>>
>> >>> f = []
>> >>> for i in range(1e7):
>> ...    f.append(0.1)
>>
>> In another window on my Linux system I then run top.  The python
>> process is now using 155 MB of memory.  Why is python using such a
>> large amount of memory, nearly nearly 16 bytes for a float in this
>> case?

[Robert Toop]
> 8 bytes for each float,

As was pointed out before, there are only two floats in that snippet (the
float object 1e7, and the float object 0.1; the example reuses the single
float object for 0.1 10 million times).  The memory is actually consumed by
the 10 million distinct integer objects in giant list constructed by
range().

> and I'm guessing 8 bytes for 2 pointers to form the list.

Nope, Python lists are contiguous vectors.  Accessing an element by index is
constant-time.

> One needs 2 pointers in order to facilitate nearby forward/backward
> reference, and to be able to remove an element without having to
> search from the start. An extensible list cannot be a pointer-free
> contiguous array unless all elements are copied each time you append
> to it.

Python uses mildly exponential over-allocation, sufficient so that growing a
list takes *amortized* O(1) time per append.  Speed of indexed list access
is much more important in most apps than speed of append, but Python has
good O() behavior for both.  O() behavior for deleting (or inserting) "in
the middle" of a list is atrocious, though (O(len(list)).  Life is a
never-ending sequence of tradeoffs <wink>.

> Access to a linked list like this requires not only extra memory,
> but extra time, to say nothing of large RAM to avoid virtual memory
> paging. You'd be better off with static-dimensioned (hence
> contiguous) arrays. Of course, that might be impractical for your app.

Python's lists are contiguous vectors, of pointers to Python objects.  The
standard array module embeds data directly in contiguous vectors, but the
kinds of data an array.array can hold are limited to simple scalar types
(floats, bounded integers, characters).  The NumPy extension has extensive
facilities for slinging numeric arrays at high speed and memory efficiency.

> If your lists require dynamic extension and sparse random access,
> you'd be better off with an indexed tree (database-like) storage
> scheme. I'm too new to python to know if such packages exist, but
> I'll bet they do.

ZODB (the object-oriented database underlying Zope) contains a Python BTree
package coded in C that's quite capable.  Not only O(log(N)) time for
lookups, and for inserts and deletes in arbitrary positions, but it performs
range searches very efficiently too (BTrees are always in "sorted order"),
and provides transparent persistence backed by a ZODB database (if you want
that -- using ZODB's BTrees from Python doesn't *require* that you use
anything else in ZODB -- they can be used as in-memory Python objects just
as easily).  You can even have BTrees much bigger than your RAM that way --
although typical RAM sizes are growing so fast, I'm not sure how much longer
I can say that with a straight face.

but-so-long-as-they-pay-me-i-won't-crack-a-shadow-of-a-smile-ly y'rs  - tim







More information about the Python-list mailing list