taking python enterprise level?...

Wed Mar 3 14:39:35 EST 2010

Hello Tim,

Pardon the questions but I haven't had the need to use denormalization 
yet, so:

Tim Wintle wrote:

> /*   Table A    */
> CREATE TABLE TableA (
>     project_id BIGINT NOT NULL,
>     cost INT,
>     date DATETIME,
>     PRIMARY KEY (project_id, date)
> );
> 
> /* Table projects */
> CREATE TABLE projects (
>     client_id BIGINT NOT NULL,
>     project_id BIGINT NOT NULL,
>     INDEX(client_id)
> );
> 
> 
> .... now the index on TableA has been optimised for queries against date
> ranges on specific project ids which should more or less be sequential
> (under a load of other assumptions) - but that reduces the efficiency of
> the query under a join with the table "projects".
> 
> If you denormalise the table, and update the first index to be on
> (client_id, project_id, date) it can end up running far more quickly -

IOW you basically merged the tables like follows?

CREATE TABLE projects (
     client_id BIGINT NOT NULL,
     project_id BIGINT NOT NULL,
     cost INT,
     date DATETIME,
     INDEX(client_id, project_id, date)
);

 From what you write further in the mail I conclude that you have not 
eliminated the first table, just made table projects look like I wrote 
above, right? (and used stored procedures to make sure that both tables 
contain the relevant data for client_id and project_id columns in both 
tables)

Have you had some other joins on denormalized keys? i.e. in example how 
the join of hypothetical TableB with projects on projects.client_id 
behave with such big tables? (bc I assume that you obviously can't 
denormalize absolutely everything, so this implies the need of doing 
some joins on denormalized columns like client_id).

> assuming you can access the first mapping anyway -

? I'm not clear on what you mean here.

> so you're still
> storing the first table, with stored procedures to ensure you still have
> correct data in all tables.

Regards,
mk