taking python enterprise level?...
mk
mrkafk at gmail.com
Wed Mar 3 14:39:35 EST 2010
Hello Tim,
Pardon the questions but I haven't had the need to use denormalization
yet, so:
Tim Wintle wrote:
> /* Table A */
> CREATE TABLE TableA (
> project_id BIGINT NOT NULL,
> cost INT,
> date DATETIME,
> PRIMARY KEY (project_id, date)
> );
>
> /* Table projects */
> CREATE TABLE projects (
> client_id BIGINT NOT NULL,
> project_id BIGINT NOT NULL,
> INDEX(client_id)
> );
>
>
> .... now the index on TableA has been optimised for queries against date
> ranges on specific project ids which should more or less be sequential
> (under a load of other assumptions) - but that reduces the efficiency of
> the query under a join with the table "projects".
>
> If you denormalise the table, and update the first index to be on
> (client_id, project_id, date) it can end up running far more quickly -
IOW you basically merged the tables like follows?
CREATE TABLE projects (
client_id BIGINT NOT NULL,
project_id BIGINT NOT NULL,
cost INT,
date DATETIME,
INDEX(client_id, project_id, date)
);
From what you write further in the mail I conclude that you have not
eliminated the first table, just made table projects look like I wrote
above, right? (and used stored procedures to make sure that both tables
contain the relevant data for client_id and project_id columns in both
tables)
Have you had some other joins on denormalized keys? i.e. in example how
the join of hypothetical TableB with projects on projects.client_id
behave with such big tables? (bc I assume that you obviously can't
denormalize absolutely everything, so this implies the need of doing
some joins on denormalized columns like client_id).
> assuming you can access the first mapping anyway -
? I'm not clear on what you mean here.
> so you're still
> storing the first table, with stored procedures to ensure you still have
> correct data in all tables.
Regards,
mk
More information about the Python-list
mailing list