data size

Martin von Loewis loewis at informatik.hu-berlin.de
Sat Nov 10 08:08:57 EST 2001


"harry" <hartanto at telusplanet.net> writes:

> > {"key1":[1,2,3,4], "key2": "hello", "key3": [1,2,3]}
>
> > Notice that dict is now a collection of 11 objects:
> > - four strings
> > - four integers: 1,2,3,4
> > - two lists: [1,2,3,4] and [1,2,3]
> > - one dictionary
> >
> 
> I would like to know the size of all objects together (the dictionary and
> the contents). I am using Python 2.0 on Windows98 SE.
> And I don't think I use any C library. (or you're refering to something that
> i'm not aware of using)

Ok, let's take it step-by-step. In 2.0, a string is defined as

typedef struct {
    int ob_refcnt; 
    struct _typeobject *ob_type;
    int ob_size;
    long ob_shash;
    PyObject *ob_sinterned;
    char ob_sval[1];
} PyStringObject;

Given that int, long, and pointer types are all 4 bytes on your
machine, and given that the strings are null-terminated in memory, the
string sizes are 25 bytes each. However, you cannot add them together,
because you have to account for the malloc overhead.

Assuming you use the Microsoft Visual C Runtime (msvcrt.dll), malloc
will add 8 bytes and round-up to a paragraph (16 bytes). This will
give 48 bytes per string, or 192 bytes for all strings together.  Of
course, if the strings appear in your source code, these 192 bytes are
consumed only once: the dictionary will share them with the source
code.

Next, let's look at the integers. They are defined as

typedef struct {
    int ob_refcnt; 
    struct _typeobject *ob_type;
    long ob_ival;
} PyIntObject;

giving 12 bytes per int object. In 2.0, ints are allocated in blocks
of 1000 bytes, to avoid the malloc overhead. So the four ints together
consume 49.15 bytes (assuming that the overhead of 24 bytes per
integer block is equally distributed over each of the 83 integers). Of
course, if these are the *only* integers, they consume 1024 bytes
together, since the entire integer block must be accounted (with 79
wasted integers).

Next, the two lists: It is defined as

typedef struct {
    struct _gc_head *gc_next;
    struct _gc_head *gc_prev;
    int gc_refs;
    int ob_refcnt; 
    struct _typeobject *ob_type;
    int ob_size;
    PyObject **ob_item;
} PyListObject;

since lists are garbage-collected. Since a list consumes two memory
blocks, the list object itself is 28 bytes. Accounting for the malloc
overhead on your system, it is 48 bytes. The lists themselves consume
as many pointers as they have elements (atleast initially), so the two
lists consume 12 and 16 bytes for the elements. Considering the malloc
overhead, each consumes 32 bytes. Together, the to lists consume 180
bytes.

Finally, the dictionary. It is

struct dictobject {
    struct _gc_head *gc_next;
    struct _gc_head *gc_prev;
    int gc_refs;
    int ob_refcnt; 
    struct _typeobject *ob_type;
    int ma_fill;
    int ma_used;
    int ma_size;
    int ma_poly;
    dictentry *ma_table;
    dictentry *(*ma_lookup)(dictobject *mp, PyObject *key, long hash);
};

So the dictionary itself will consume 44 bytes; considering the malloc
overhead, that will be 64 bytes. The ma_table consists of structures

typedef struct {
	long me_hash;
	PyObject *me_key;
	PyObject *me_value;
} dictentry;
 
i.e. 12 bytes per entry. With three keys, the dictionary will have
eight entries in 2.0, giving 96 bytes of entries. With malloc
overhead, this comes to 112 bytes.

Counting this all together, we get 372 bytes without allocation
overhead, 597.15 bytes with malloc overhead, and 1572 bytes with both
malloc and integer-preallocation overhead; somebody correct me if I'm
wrong.

Hope this helps,
Martin

P.S. Of course, assuming that these are the only allocations done, you
have to account the overhead of the Microsoft small block allocator;
it obtains, via VirtualAlloc, always a page for small blocks. So these
objects together consume 4096 bytes, minimum. That is probably the
number you get for any other language, as well, since it is the
minimum amount of memory that a Windows process can allocate from the
operating system.



More information about the Python-list mailing list