# data size

Ken Seehof kseehof at neuralintegrator.com
Sat Nov 10 12:50:13 CET 2001

```> > He is probably asking you which C library your version of Python was
> > compiled with. But you don't need to know that, either.
> >
> > All Python dictionaries are a standard 2.5 cm by 3.6cm. Integers have no
> > width and are all 1.2 cm in length. Strings are all 2 mm times the
number
> of
> > characters, except Unicode strings, which are 4 mm times the number of
> > characters.
>
> could you explain further about the metric standard you're using. this is
> the first time a size of data structure is measured using meters instead
of
> byte/bit.
> i need the information for my post-mortem of my assignment to explain why
> using python data structure would be efficient. yes, i'm only a studemt
who
> is still need to learn lots of stuffs.
> thanks.

Okay, so everyone's explained why you don't care what the answer to
your question is :-).  Actually, the size of a data structure does sometimes
matter, specifically when you are dealing with particularly huge quantities
of data.  For example, python genetic molecular simulators usually store
the entire human genome in a dictionary in memory.  Don't they?  :-)

It is generally more difficult to analytically figure out memory usage in
python than in c, so what I do in this kind of situation is do the empirical
thing.

>>> def makedict(x):
...  d = {}
...  for i in xrange(x):
...   d[random.randint(100,1000000000)] =
random.randint(100,1000000000)
...  return d

>>> a = makedict(1024*1024)
>>> b = makedict(1024*1024)
>>> del a
>>> del b
>>> ... etc....

By watching my memory monitor, I determined that the dictionary costs
about 32 bytes per entry (give or take a byte).  It doesn't really matter
much what the bytes are used for, but if you are in the mood to get
analitical...

int ob_refcnt; \
struct _typeobject *ob_type;

A python object is 8 bytes plus data.  That's 12 bytes per integer
(note that integers are 0 bytes for -3 < n < 100).  I'd expect the hash
table to cost about 3 pointers per entry for a well-balanced hash table.
That's 12 bytes.  So an entry in our dictionary (2 integers and a hash
index entry) should be 36 bytes.  So, I'm wondering, where'd the
extra four bytes go?  (well maybe I'm just being sloppy...)