Profiling gives very different predictions of best algorithm

Sat May 2 11:10:22 EDT 2009

On May 1, 7:38 pm, Terry Reedy <tjre... at udel.edu> wrote:

> I presume in your overall time text, you ran the two versions of the
> algorith 'naked'.  But, for some reason, you are profiling them embedded
> inside a test suite and runner.  It does not seem that this should
> affect relative timing, but I have seen some pretty strange behaviors.
> At best, it will add noise.
>
> Let me expand my question: what did you do differently between the two
> profile runs?

When I compute the electron repulsion integrals, the two different
methods are:

                        if packed:
                            ijkl = intindex(i,j,k,l)
                            Ints[ijkl] = coulomb(bfs[i],bfs[j],bfs
[k],bfs[l])
                        else:
                            val = coulomb(bfs[i],bfs[j],bfs[k],bfs[l])
                            Ints[i,j,k,l] = val
                            Ints[j,i,k,l] = val
                            Ints[i,j,l,k] = val
                            Ints[j,i,l,k] = val
                            Ints[k,l,i,j] = val
                            Ints[k,l,j,i] = val
                            Ints[l,k,i,j] = val
                            Ints[l,k,j,i] = val

and when I access the integrals the differences are:

                    if packed:
                        index = intindex(i,j,k,l)
                        temp[kl] = Ints[index]
                    else:
                        temp[kl] = Ints[i,j,k,l]

If you look at the profiling, I'm making something like 11M calls to
intindex, which is a routine I've written in C. I thought that by
storing all N**4 integrals (rather than the N**4/8 that packed stores)
would remove the need for the intindex calls and speed things up, but
it's 50% slower. I can't really figure out why it's slower, though,
since profiling makes it look like the slower version is faster.
Something like 30% of the time for the full PyQuante test suite is
taken up with calls to intindex, and I'd like to remove this if
possible.

I also wrote a version of the code where I stored all of the intindex
values in a Python dictionary rather than calling a C function. That
version appeared to be roughly the same speed when I profiled the
code, although I didn't test it without profiling (stupidly) because
at that point I didn't recognize the magnitude of the discrepancy
between the profiling/nonprofiling timings.

I was posting to the list mostly because I would like to know whether
I can do something different when I profile my code so that it give
results that correspond more closely to those that the nonprofiling
results give.