How to find bad row with db api executemany()?
Roy Smith
roy at panix.com
Fri Mar 29 22:44:53 EDT 2013
In article <roy-A61512.20410329032013 at news.panix.com>,
Roy Smith <roy at panix.com> wrote:
> In article <mailman.3971.1364595940.2939.python-list at python.org>,
> Dennis Lee Bieber <wlfraed at ix.netcom.com> wrote:
>
> > If using MySQLdb, there isn't all that much difference... MySQLdb is
> > still compatible with MySQL v4 (and maybe even v3), and since those
> > versions don't have "prepared statements", .executemany() essentially
> > turns into something that creates a newline delimited "list" of
> > "identical" (but for argument substitution) statements and submits that
> > to MySQL.
>
> Shockingly, that does appear to be the case. I had thought during my
> initial testing that I was seeing far greater throughput, but as I got
> more into the project and started doing some side-by-side comparisons,
> it the differences went away.
OMG, this is amazing.
http://stackoverflow.com/questions/3945642/
It turns out, the MySQLdb executemany() runs a regex over your SQL and
picks one of two algorithms depending on whether it matches or not.
restr = (r"\svalues\s*"
r"(\(((?<!\\)'[^\)]*?\)[^\)]*(?<!\\)?'"
r"|[^\(\)]|"
r"(?:\([^\)]*\))"
r")+\))")
Leaving aside the obvious line-noise aspects, the operative problem here
is that it only looks for "values" (in lower case).
I've lost my initial test script which convinced me that executemany()
would be a win; I'm assuming I used lower case for that. Our production
code uses "VALUES".
The slow way (i.e. "VALUES"), I'm inserting 1000 rows about every 2.4
seconds. When I switch to "values", I'm getting more like 1000 rows in
100 ms!
A truly breathtaking bug.
More information about the Python-list
mailing list