How would you design scalable solution?

Bryan bryanvick at gmail.com
Tue Oct 27 18:10:08 CET 2009


I'm designing a system and wanted to get some feedback on a potential
performance problem down the road while it is still cheap to fix.

The system is similar to an accounting system where a system tracks
"Things"
which move between different "Buckets".  The system answers these
questions:
- How many Things are in this Bucket today?
- Move Things from Bucket1 to Bucket2...
- Now how many Things are in each Bucket?

So each time a Thing is moved between Buckets, I imagine a DB row like
this:
 | id | thingId | toBucket | fromBucket | qty |

Then to find how many Things are in a certain Bucket:
1. Start with initial qty in the bucket at the beginning of time
2. *Add* all qty moved *to* the bucket since beginning of time
3. *Subtract* all qty moved *from* the bucket since beginning of time

Simple system to "account" for Things.

My problem is this.  This design will inherantly get slower as time
goes on.
As the number of rows recording a transfer between buckets increases,
the query
to see how many Things are in a Bucket will get slower.  I experience
this when
I use gnucash (which I love).  I don't do "closing entries" at the end
of the
year, so each account has every transaction I have every made.  I see
it getting
slower.  It is nothing I am going to do anything about, because it is
still fast
enough for me.  But I have to wonder how big companies with thousands
of
transactions a day do this?

One solution would be to do a "closing entry" at certain periods in
time, so old
info would be archived.  Each bucket would start the new time period
with a
balance of Things equal to what it was at the point in time we
"closed".

How else to keep a record of every transaction, but not have the speed
of the
question "How many Things in Bucket x" depend on looking @ every
transaction
record ever made?





More information about the Python-list mailing list