taking python enterprise level?...
Tim Wintle
tim.wintle at teamrubber.com
Wed Mar 3 12:50:02 EST 2010
On Wed, 2010-03-03 at 17:26 +0100, mk wrote:
>
> So there *may* be some evidence that joins are indeed bad in
> practice.
> If someone has smth specific/interesting on the subject, please post.
I have found joins to cause problems in a few cases - I'm talking about
relatively large tables though - roughly order 10^8 rows.
I'm on Mysql normally, but that shouldn't make any difference - I've
seen almost the same situation on Oracle
consider this simple example:
/* Table A */
CREATE TABLE TableA (
project_id BIGINT NOT NULL,
cost INT,
date DATETIME,
PRIMARY KEY (project_id, date)
);
/* Table projects */
CREATE TABLE projects (
client_id BIGINT NOT NULL,
project_id BIGINT NOT NULL,
INDEX(client_id)
);
... now the index on TableA has been optimised for queries against date
ranges on specific project ids which should more or less be sequential
(under a load of other assumptions) - but that reduces the efficiency of
the query under a join with the table "projects".
If you denormalise the table, and update the first index to be on
(client_id, project_id, date) it can end up running far more quickly -
assuming you can access the first mapping anyway - so you're still
storing the first table, with stored procedures to ensure you still have
correct data in all tables.
I'm definitely glossing over the details - but I've definitely got
situations where I've had to choose denormalisation over purity of data.
Rolled-up data tables are other situations - where you know half your
queries are grouping by field "A" it's sometimes a requirement to store
that.
Tim
More information about the Python-list
mailing list