taking python enterprise level?...

mk mrkafk at gmail.com
Wed Mar 3 15:58:15 EST 2010


Philip Semanchuk wrote:
>> So there *may* be some evidence that joins are indeed bad in practice. 
>> If someone has smth specific/interesting on the subject, please post.
> 
> It's an unprovable assertion, or a meaningless one depending on how one 
> defines the terms. You could also say "there *may* be some evidence that 
> Python lists are bad in practice". Python lists and SQL JOINs are like 
> any part of a language or toolkit. They're good tools for solving 
> certain classes of problems. They can also be misapplied to problems 
> that they're not so good at. Sometimes they're a performance bottleneck, 
> even when solving the problems for which they're best. Sometimes the 
> best way to solve a performance bottleneck is to redesign your 
> app/system so you don't need to solve that kind of problem anymore 
> (hence the join-less databases). Other times, the cure is worse than the 
> disease and you're better off throwing hardware at the problem.

Look, I completely agree with what you're saying, but: that doesn't 
change the possibility that joins may be expensive in comparison to 
other SQL operations. This is the phrase I should have used perhaps; 
'expensive in comparison with other SQL operations' instead of 'bad'.

Example from my app, where I behaved "by the book" (I hope) and 
normalized my data:

$ time echo "\c hrs;
SELECT hosts.ip, reservation.start_date, architecture.architecture, 
os_kind.os_kind, os_rel.os_rel, os_version.os_version, project.project, 
email.email FROM hosts
  INNER JOIN project ON project.id = hosts.project_id
  INNER JOIN architecture ON hosts.architecture_id = architecture.id
  INNER JOIN os_kind ON os_kind.id = hosts.os_kind_id
  INNER JOIN os_rel ON hosts.os_rel_id = os_rel.id
  INNER JOIN os_version ON hosts.os_version_id = os_version.id
  INNER JOIN reservation_hosts ON hosts.id = reservation_hosts.host_id
  INNER JOIN reservation on reservation.id = 
reservation_hosts.reservation_id
  INNER JOIN email ON reservation.email_id = email.id

;" | psql > /dev/null

real    0m0.099s
user    0m0.015s
sys     0m0.005s



$ time echo "\c hrs;
 > SELECT hosts.ip FROM hosts;
 > SELECT reservation.start_date FROM reservation;
 > SELECT architecture.architecture FROM architecture;
 > SELECT os_rel.os_rel FROM os_rel;
 > SELECT os_version.os_version FROM os_version;
 > SELECT project.project FROM project;
 > SELECT email.email FROM email;
 > " | psql > /dev/null

real    0m0.046s
user    0m0.008s
sys     0m0.004s

Note: I've created indexes on those tables, both on data columns like 
hosts.ip and on .id columns.

So yes, joins apparently are at least twice as expensive as simple 
selects without joins, on a small dataset. Not a drastic increase in 
cost, but smth definitely shows.

It would be interesting to see what happens when row numbers increase to 
large numbers, but I have no such data.

Regards,
mk





More information about the Python-list mailing list