Question about optimization

Thu Jul 24 17:40:54 EDT 2008

On Thu, 24 Jul 2008 17:19:41 -0400, Wei Hao <weihao89 at gmail.com> wrote:
>Hi:
>
>I'm pretty new to python and I have some optimization issues. I'll show you
>the piece of code which is causing it, with pseudo-code before it and
>comments. I'm accessing a gigantic table (like 15 million rows) in SQL.
>
>d is some dictionary, r is a precompiled regex string
>Big loop, so I search through the table in chunks given by delta
>    SQL query ("select * from table where rowID >= n and rowID < (n +
>delta)"), result of query stored in a. Each individual row is a[n1], columns
>of rows are a[n1][n2].
>
> [snip]
>
>I am 100% sure it's this code snippet that's the cause of my problems.
>Here's what I can tell you. Each chunk of rows that I grab is essentially
>equal in size (rowID skips over stuff, but rather arbitrarily). The time it
>takes to fetch the SQL query doesn't change. But as the program progresses,
>this snippet gets slower. Here's the output:
>
>2500 0.441551299341
>5000 1.26162739664
>7500 2.35092688403
>10000 3.48417469666
>12500 4.59031305491
>15000 5.78972588775
>17500 6.28305527139
>20000 6.73344570903
>22500 8.31732146487
>25000 9.65322872159
>27500 8.98186042757
>30000 11.8042818095
>32500 12.1965593712
>35000 13.2735763291
>37500 14.0282617344
>
>What is it in the code snippet that slows down as n increases? Is there
>something about the way low level python functions I don't understand which
>is slowing me down?

Perhaps you need an index on rowID.

Jean-Paul