How would you design scalable solution?
joncle at googlemail.com
Wed Oct 28 07:20:20 CET 2009
On 27 Oct, 17:10, Bryan <bryanv... at gmail.com> wrote:
> I'm designing a system and wanted to get some feedback on a potential
> performance problem down the road while it is still cheap to fix.
> The system is similar to an accounting system where a system tracks
> which move between different "Buckets". The system answers these
> - How many Things are in this Bucket today?
> - Move Things from Bucket1 to Bucket2...
> - Now how many Things are in each Bucket?
> So each time a Thing is moved between Buckets, I imagine a DB row like
> | id | thingId | toBucket | fromBucket | qty |
> Then to find how many Things are in a certain Bucket:
> 1. Start with initial qty in the bucket at the beginning of time
> 2. *Add* all qty moved *to* the bucket since beginning of time
> 3. *Subtract* all qty moved *from* the bucket since beginning of time
> Simple system to "account" for Things.
> My problem is this. This design will inherantly get slower as time
> goes on.
> As the number of rows recording a transfer between buckets increases,
> the query
> to see how many Things are in a Bucket will get slower. I experience
> this when
> I use gnucash (which I love). I don't do "closing entries" at the end
> of the
> year, so each account has every transaction I have every made. I see
> it getting
> slower. It is nothing I am going to do anything about, because it is
> still fast
> enough for me. But I have to wonder how big companies with thousands
> transactions a day do this?
> One solution would be to do a "closing entry" at certain periods in
> time, so old
> info would be archived. Each bucket would start the new time period
> with a
> balance of Things equal to what it was at the point in time we
> How else to keep a record of every transaction, but not have the speed
> of the
> question "How many Things in Bucket x" depend on looking @ every
> record ever made?
As well as what Jonathan has said, the following might be worth
() "Table Partitioning" - a specific example for postgres is here:
(Although other DB's support similar techniques)
() Shared Nothing Architecture (aka. 'Sharding') with appropriate
'buckets' which is massively scalable, but I'm guessing overkill :)
http://en.wikipedia.org/wiki/Shared_nothing_architecture - IIRC
SQLAlchemy has a basic implementation of this in its examples.
More information about the Python-list