Dictionaries as records

Wed Dec 19 08:21:05 EST 2001

"Bill Wilkinson" <bwilk_97 at yahoo.com> wrote in message
news:7_XT7.28652$t07.4042756 at twister.midsouth.rr.com...
>
> "John Roth"
> > Your records seem to average around 250 bytes, so that's about 16
> > characters per field. Object overhead for each object is roughly the
> > same as this (it might be larger, I haven't looked at the headers
> > recently.)
>
> Yes sir, about 250 bytes.  Here is some code that shows what I am
confused
> about. (Hoping my spaces don't get stripped out while being posted)
> There is a big difference between the memory usage of the two sections
of
> code below. I can work around this, but I am a bit curious now.  It
would be
> nice to know why the second part below takes up so much ram (I think
about
> 1.2k per rec?).
>
> #A sample record that looks like the ones I use. Data obfuscated.
> d = {"f":"kdfkjdkfjdkj",
>     "g":"ldjfljdfkjdf",
>     "u":"dkfkjdfkjdfj",
>     "t":"kdjfkjdfkjdkjfkjdf",
>     "u1":"kdjfkjdjdkjfkjdfjkdjf",
>     "ii2":"kjdfkjdfjkdjfkjdfjdfj",
>     "g3":"ldjfljdfkjdf",
>     "u4":"dkfkjdfkjdfj",
>     "g5":"ldjfljdfkjdf",
>     "u6":"dkfkjdfkjdfj",
>     "g7":"ldjfljdfkjdf",
>     "u8":"dkfkjdfkjdfj",
>     "g9":"ldjfljdfkjdf",
>     "u10":"dkfkjdfkjdfj",
>     "g11":"ldjfljdfkjdf",
>     "u12":"dkfkjdfkjdfj",}
>
> #Method 1
> #Just make a bunch of copies of the same record again and again.
> tbl = []
> for x in range(15000):
>     tbl.append(d.copy())

In this case, you have one copy of each of your input
strings, not 15000 copies! The memory usage should be
pretty close to pure dictionary overhead.

> raw_input("Check your Ram usage then press enter")
>
> #Method 2
> #Ok, now change each record just a little bit.
> #Hack off the last two chars from each field and
> #add one new character. Then append the new record
> #to the table.
> tbl = []
> for x in range(15000):
>     t = d.copy()
>     for k in t.keys():
>         t[k] = t[k][:-2] + str(x)[-1]
>     tbl.append(t.copy())
> print "Now check your memory usage again."

Now, each string is a unique object, so around
800 bytes of the 1.5K overhead is due to the
strings..

If I remember your original post, this also supports
the observation that the dictionary approach takes
twice as much memory as the list approach.

Given my relatively limited understanding of
dictionaries, they seem to be about three times
the size I would expect.

Something you might try to see if it makes a
difference is to build the dictionary originally
with the unique strings, rather than updating it. It may
make a huge difference, or it may make no difference
at all.

John Roth