My Python/MySQL code is too slow

Wed Jun 19 21:18:33 EDT 2002

Duncan Smith wrote in comp.lang.python:
> I have already posted this problem on mailing.database.mysql as I see it as
> a problem with my lack of knowledge of SQL.  But no responses so far, and
> maybe Python could do more of the work.

Doing work in Python that the db can do is not generally advisabele
for performance reasons. MySQL, however, is a rather limited RDBMS. If
you want better SQL support, I'd recommend to take a look at
PostgreSQL. Outer joins and subselects are missing from MySQL, but
available in PostgreSQL and it looks like they could be useful in your
case.

> I repost the original question below (SQL easier to follow), and the
> Python code below that.  Any help appreciated.  TIA.

> 
> ---------------------------------------------------------------------
> I have two tables (say dbA and dbB) and need to find the number of rows in
> dbB which have exactly 2 matching rows in dbA (on the selected columns).
> I'm using Python/MySQL, but the relevant SQL is below.  This is slow, and
> I'm sure it's possible without creating the temporary table and without the
> join.  But everything I try fails (because of my limited knowledge of SQL).
> Can anyone show me an efficient way of getting the answer?  All I need is
> the number of rows in dbB which have exactly 2 matching rows in dbA.  Thanks
> in advance.
> 
> CREATE TEMPORARY TABLE tmp SELECT var1, var2, var3 FROM dbA GROUP BY var1,
> var2, var3 HAVING COUNT(*)=2;
> 
> SELECT dbB.var1, dbB.var2, dbB.var3, tmp.var1, tmp.var2, tmp.var3 FROM dbB,
> tmp WHERE dbB.var1=tmp.var1 AND dbB.var2=tmp.var2 AND dbB.var3=tmp.var3;

I have no idea what dbA and dbB contain, but it looks like they
contain the same sort of data. This is not good db design, but that's
normal for SQL newbies. I'm not very good at explaining the theory,
but you could take a look at one of these links:

http://www.google.com/search?hl=en&lr=&q=relational+database+normalization

("normalization" is the basic concept of good relational db design)

It's very likely that reorganizing your db schema into a one-to-many
relationship between two tables, and thus unifiying dbA and dbB will
simpify your problem.

> [...]
>         conn = MySQLdb.connect()
>         curs = conn.cursor()
>         curs.execute('USE %s' % (dbname,))

conn = MySQLdb.connect(db=dbname)

HTH,

Gerhard
-- 
mail:   gerhard <at> bigfoot <dot> de       registered Linux user #64239
web:    http://www.cs.fhm.edu/~ifw00065/    OpenPGP public key id AD24C930
public key fingerprint: 3FCC 8700 3012 0A9E B0C9  3667 814B 9CAA AD24 C930
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))