taking python enterprise level?...

Tim Wintle tim.wintle at teamrubber.com
Wed Mar 3 12:50:02 EST 2010


On Wed, 2010-03-03 at 17:26 +0100, mk wrote:
> 
> So there *may* be some evidence that joins are indeed bad in
> practice. 
> If someone has smth specific/interesting on the subject, please post.

I have found joins to cause problems in a few cases - I'm talking about
relatively large tables though - roughly order 10^8 rows.

I'm on Mysql normally, but that shouldn't make any difference - I've
seen almost the same situation on Oracle

consider this simple example:

/*   Table A    */
CREATE TABLE TableA (
    project_id BIGINT NOT NULL,
    cost INT,
    date DATETIME,
    PRIMARY KEY (project_id, date)
);

/* Table projects */
CREATE TABLE projects (
    client_id BIGINT NOT NULL,
    project_id BIGINT NOT NULL,
    INDEX(client_id)
);


... now the index on TableA has been optimised for queries against date
ranges on specific project ids which should more or less be sequential
(under a load of other assumptions) - but that reduces the efficiency of
the query under a join with the table "projects".

If you denormalise the table, and update the first index to be on
(client_id, project_id, date) it can end up running far more quickly -
assuming you can access the first mapping anyway - so you're still
storing the first table, with stored procedures to ensure you still have
correct data in all tables.

I'm definitely glossing over the details - but I've definitely got
situations where I've had to choose denormalisation over purity of data.


Rolled-up data tables are other situations - where you know half your
queries are grouping by field "A" it's sometimes a requirement to store
that.


Tim






More information about the Python-list mailing list