Dictionaries as records

Steve Holden sholden at holdenweb.com
Wed Dec 19 07:58:47 EST 2001


"Bill Wilkinson" <bwilk_97 at yahoo.com> wrote in message
news:7_XT7.28652$t07.4042756 at twister.midsouth.rr.com...
>
> "John Roth"
> > Your records seem to average around 250 bytes, so that's about 16
> > characters per field. Object overhead for each object is roughly the
> > same as this (it might be larger, I haven't looked at the headers
> > recently.)
>
> Yes sir, about 250 bytes.  Here is some code that shows what I am confused
> about. (Hoping my spaces don't get stripped out while being posted)
> There is a big difference between the memory usage of the two sections of
> code below. I can work around this, but I am a bit curious now.  It would
be
> nice to know why the second part below takes up so much ram (I think about
> 1.2k per rec?).
>
> #A sample record that looks like the ones I use. Data obfuscated.
> d = {"f":"kdfkjdkfjdkj",
>     "g":"ldjfljdfkjdf",
>     "u":"dkfkjdfkjdfj",
>     "t":"kdjfkjdfkjdkjfkjdf",
>     "u1":"kdjfkjdjdkjfkjdfjkdjf",
>     "ii2":"kjdfkjdfjkdjfkjdfjdfj",
>     "g3":"ldjfljdfkjdf",
>     "u4":"dkfkjdfkjdfj",
>     "g5":"ldjfljdfkjdf",
>     "u6":"dkfkjdfkjdfj",
>     "g7":"ldjfljdfkjdf",
>     "u8":"dkfkjdfkjdfj",
>     "g9":"ldjfljdfkjdf",
>     "u10":"dkfkjdfkjdfj",
>     "g11":"ldjfljdfkjdf",
>     "u12":"dkfkjdfkjdfj",}
>
> #Method 1
> #Just make a bunch of copies of the same record again and again.
> tbl = []
> for x in range(15000):
>     tbl.append(d.copy())
>
> raw_input("Check your Ram usage then press enter")
>
> #Method 2
> #Ok, now change each record just a little bit.
> #Hack off the last two chars from each field and
> #add one new character. Then append the new record
> #to the table.
> tbl = []
> for x in range(15000):
>     t = d.copy()
>     for k in t.keys():
>         t[k] = t[k][:-2] + str(x)[-1]
>     tbl.append(t.copy())
> print "Now check your memory usage again."
>
Bill:

Try using copy.deepcopy() in place of d.copy() in your first section to see
whether it changes the memory requirements significantly. A copy of a
dictionary does not need to copy the values of the keys or the items, since
they are held as references. Using deep copying might duplicate more data
(though I have not read the interpreter source to verify this...).

Basically, though, I'm afraid you might have to accept that you have passed
some limit of usefulness for the programming technique you originally chose.
Unfortunately there is never any guarantee that a particular method will
scale well to larger data sets, particularly when you rely on the intrinsic
properties of some language artefact like the Python dictionary.

You might ultimately have to consider one of the techniques other posters
have mentioned (or possibly a relational database).

regards
 Steve
--
http://www.holdenweb.com/








More information about the Python-list mailing list