[Tutor] Shelve & immutable objects

Danny Yoo dyoo at hashcollision.org
Thu Jan 2 18:21:38 CET 2014


> Separately, I'm also curious about how to process big files. For example, I
> was trying to play 100 million games of chutes & ladders, and I crashed my
> machine, I believe: the game results, including 4 ints & 2 short lists of
> ints per game, are gathered into a list, so it can become a pretty big list.
> I need to do stats and other analyses on it in the end (okay, I really don't
> NEED to play 100 million games of chutes & ladders, but as long as I
> have...): I suppose I could break it into manageable (maybe 1 million games
> each), but that will make some of the stats either clunky or wrong (I
> haven't really attacked that part yet).

This is probably one of those cases where you don't want to persist
this in active memory, but rather store it in some kind of long-term
store.

We can do a back of the envelope calculation to see why.  An int's
native size is 4 bytes on 32-bit machines.  This is Python, so there's
a bit more overhead.  Let's roughly say that each record about 10
ints, so about 40 bytes.

    100 * 10^6 records * 40 bytes per record = 4 * 10^9 bytes.

###############################
>>> import humanize
>>> humanize.intword(4*10**9)
'4 billion'
###############################

So that's about 4 billion bytes, as a very rough guess.

Unfortunately, when we get to those magnitudes, that's way too large
to fit into a standard computer's memory.  32-bit machines can only
address up to about 4 billion bytes:

################################
>>> humanize.intword(2**32)
'4.3 billion'
################################

So trying to juggle all those records in short-term RAM is a
non-starter: it won't work.  We'll want to do something different
instead, such as saving each record into a persistent on-disk,
external database.

If you're interested, bring this up as a separate topic on
python-tutor, and I'm sure folks here will be happy to talk about it.


Good luck!


More information about the Tutor mailing list