Bugfix for weave's catalog.

Hi all, there has been a long-standing bug in weave.inline, where it recompiles stuff it shouldn't. I was finally able to track it down to a sys.path problem, where sys.path does not get the ~/.pythonXX_compiled directory added, and hence calls to pickle.load() fail when loading items from the shelved on-disk catalog. The patch here fixes that problem. In the process, I cleaned up catalog.py a bit, adding caching to the default_dir() call (which is used often) and adding a few catalog.close() calls for safety, since the semantics of shelved catalogs are undefined (implementation-dependent) if not explicitly closed. I can commit this, but I'd greatly appreciate it if someone else has a look at it first (Pearu, RKern?). I know that nobody likes to muck with the weave code, as it's a bit complex, but I'd feel safer if at least one more pair of eyeballs has a look at this before I commit. Regards, f

"FP" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
FP> Hi all, there has been a long-standing bug in weave.inline, FP> where it recompiles stuff it shouldn't. I was finally able to [...] FP> I can commit this, but I'd greatly appreciate it if someone FP> else has a look at it first (Pearu, RKern?). I know that FP> nobody likes to muck with the weave code, as it's a bit FP> complex, but I'd feel safer if at least one more pair of FP> eyeballs has a look at this before I commit. Thanks for tracking this down. I just read the patch and it seems OK, however, do you have a test case to go with the fix? If so it would be useful to have the test checked in as well. cheers, prabhu

Prabhu Ramachandran wrote:
"FP" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
FP> Hi all, there has been a long-standing bug in weave.inline, FP> where it recompiles stuff it shouldn't. I was finally able to [...]
FP> I can commit this, but I'd greatly appreciate it if someone FP> else has a look at it first (Pearu, RKern?). I know that FP> nobody likes to muck with the weave code, as it's a bit FP> complex, but I'd feel safer if at least one more pair of FP> eyeballs has a look at this before I commit.
Thanks for tracking this down. I just read the patch and it seems OK, however, do you have a test case to go with the fix? If so it would be useful to have the test checked in as well.
Well, the problem is that I was never able to see the problem with little weave.inline snippets. I ended up using some code which relies on a large internal library of mine, but which at least would trigger the misbehavior every time. Part of why it took me so long to work on this, is that I had never been able to identify a reliable way to reproduce the problem, until this code started showing it on every run. But unfortunately it's a big library for internal use, so it won't work for a scipy test. I'm sure it can be triggered with smaller examples, but all the ones I tried failed to show it. Frankly, I don't quite understand why, though. From the looks of the bug, it should occur every time... If someone else has a small standalone example which shows the problem reliably, that would be great to have. Cheers, f

"FP" == Fernando Perez <Fernando.Perez@colorado.edu> writes: >> Thanks for tracking this down. I just read the patch and it >> seems OK, however, do you have a test case to go with the fix? >> If so it would be useful to have the test checked in as well.
FP> Well, the problem is that I was never able to see the problem FP> with little weave.inline snippets. I ended up using some code FP> which relies on a large internal library of mine, but which at FP> least would trigger the misbehavior every time. Part of why FP> it took me so long to work on this, is that I had never been FP> able to identify a reliable way to reproduce the problem, FP> until this code started showing it on every run. OK, I've also run into this from time to time but never had the chance to make a simple test case. cheers, prabhu

"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
Fernando> But unfortunately it's a big library for internal use, Fernando> so it won't work for a scipy test. I'm sure it can be Fernando> triggered with smaller examples, but all the ones I Fernando> tried failed to show it. Frankly, I don't quite Fernando> understand why, though. From the looks of the bug, it Fernando> should occur every time... Fernando> If someone else has a small standalone example which Fernando> shows the problem reliably, that would be great to have. The weave demo code I was running at Los Alamos showed this bug so it will be a good test case. I have it on my laptop at home, and so can't send it now. Fernando, do you have a copy of the roadshow directory on your laptop? JDH

John Hunter wrote:
"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
Fernando> But unfortunately it's a big library for internal use, Fernando> so it won't work for a scipy test. I'm sure it can be Fernando> triggered with smaller examples, but all the ones I Fernando> tried failed to show it. Frankly, I don't quite Fernando> understand why, though. From the looks of the bug, it Fernando> should occur every time...
Fernando> If someone else has a small standalone example which Fernando> shows the problem reliably, that would be great to have.
The weave demo code I was running at Los Alamos showed this bug so it will be a good test case. I have it on my laptop at home, and so can't send it now. Fernando, do you have a copy of the roadshow directory on your laptop?
Yes, I'd forgotten. But wasn't that weave.blitz code? It will actually be very good to test with that, b/c my tests were all with weave.inline(), so I'm not 100% sure that my fix covers your case. I'll try to fold your test in, and integrate it into the test suite. It may take me a couple days, as other things have hit the proverbial fan. Cheers, f

John Hunter wrote:
The weave demo code I was running at Los Alamos showed this bug so it will be a good test case. I have it on my laptop at home, and so can't send it now. Fernando, do you have a copy of the roadshow directory on your laptop?
Actually, I just checked and that stuff never made it to SVN, so I don't have it. Could you either commit it or mail it to me so I can test and see if my fix also works for your code? Cheers, f

"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
Fernando> Actually, I just checked and that stuff never made it to Fernando> SVN, so I don't have it. Could you either commit it or Fernando> mail it to me so I can test and see if my fix also works Fernando> for your code? Here is the script that shows the bug. It is designed to compare the performance of weave blitz versus numeric for repeated adds of a 2D array x + x x + x + x x + x + x + x ....and so on On the second iteration through the loop (x+x+x), it issues a "repairing catalog by removing key" and recompiles the extensions -- it doesn't do this for any other of the repeated add lines in this loop. This happens repeatedly if you rerun the script, so the cache is being ignored. Unfortunately, I only see this on my G4 powerbook and not on my linux box, so it might be hard for others to use as a test script. Tomorrow I'll try and update scipy weave on my powerbook from CVS and try it with and w/o your patch. JDH from __future__ import division import sys, time from Numeric import zeros, Float from MLab import rand import weave from pylab import subplot, plot, show, legend, xlabel, ylabel, title shape = 200,200 x = rand(*shape) def repeat_nadds(Nadds, Nevals, useWeave): """ Time the addition of i=2,Nadds arrays. Evaluate each expression Nevals times to produce accurate timing results. If useWeave is True, use weave to inline the addition, else use Numeric return value is n,t where n is a list of the the number of arrays added and t is the average time it took to add the arrays """ results = [] for i in range(2,Nadds): s = 'result = %s' % '+'.join(['x']*i) print 'evaluating: %s with weave=%s' % (s,useWeave) tstart = time.time() # only weave needs to predefine result array if useWeave: result= zeros(shape, typecode=Float) for j in range(Nevals): if useWeave: weave.blitz(s) else: exec(s) elapsed = (time.time()-tstart)/Nevals print '\tNadds=%d Elapsed=%1.2f' % (i, elapsed) results.append( (i, elapsed) ) return zip(*results) Nadds = 7 Nevals = 20 # evaluate weave nw, tw = repeat_nadds(Nadds, Nevals, useWeave=True) # evaluate Numeric nn, tn = repeat_nadds(Nadds, Nevals, useWeave=False) # plot weave versus Numeric ax = subplot(111) plot(nw, tw, 'go', nn, tn, 'bs') legend( ('Weave', 'Numeric') ) xlabel('num adds') ylabel('time (s)') title('Numeric vs weave; repeated adds') ax.set_xlim( (0, Nadds+1)) show()

John Hunter wrote:
"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
Fernando> Actually, I just checked and that stuff never made it to Fernando> SVN, so I don't have it. Could you either commit it or Fernando> mail it to me so I can test and see if my fix also works Fernando> for your code?
Here is the script that shows the bug. It is designed to compare the performance of weave blitz versus numeric for repeated adds of a 2D array
x + x x + x + x x + x + x + x ....and so on
On the second iteration through the loop (x+x+x), it issues a "repairing catalog by removing key" and recompiles the extensions -- it doesn't do this for any other of the repeated add lines in this loop. This happens repeatedly if you rerun the script, so the cache is being ignored. Unfortunately, I only see this on my G4 powerbook and not on my linux box, so it might be hard for others to use as a test script.
OK, I just tested this with my linux box, and I can both see the problem under the old weave, and have it fixed with the new one. So we're in good shape. However, I'm swamped with work and other things, so it may take me some time (probably until next week) until I can commit all this with proper unit testing. I need to figure out how to fold the compilation failure into a lower-level unit test, since by default weave absorbs the failure almost silently. You see it failed because there is a slowdown, but no exceptions are raised. So getting this unit-tested properly will take a bit of work: if anyone can help me with that, I'd be grateful. I've already sunk a ton of time into just getting the bug tracked down and fixed. If someone offers to help, I'll commit the bugfixes (they touch multiple files), and then I'll rely on someone else to write the proper unit test (keep in mind, it has to fail only on _second_ execution, and you need to clean up the default catalog, etc; it's not a totally easy unit test to write). Cheers, f

"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
[...] Fernando> However, I'm swamped with work and other things, so it Fernando> may take me some time (probably until next week) until I Fernando> can commit all this with proper unit testing. I need to Fernando> figure out how to fold the compilation failure into a Fernando> lower-level unit test, since by default weave absorbs Fernando> the failure almost silently. You see it failed because Fernando> there is a slowdown, but no exceptions are raised. Oh no, I did not mean that you add a full unit test. I completely understand the difficulty in engineering a pukka unit test for this kind of thing. What I had asked for was atleast an example demonstrating the problem, appropriately documented, and checked in so we know how to test and check if things are ok. cheers, prabhu

I do see the same behavior described below on my Linux box: *Only* on the second iteration I get "repairing catalog by removing key"... Debian Linux (i386), sid, python 2.3.5, weave 0.3.2 (the actual package versions on Debian) Manuel John Hunter schrieb:
"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
Fernando> Actually, I just checked and that stuff never made it to Fernando> SVN, so I don't have it. Could you either commit it or Fernando> mail it to me so I can test and see if my fix also works Fernando> for your code?
Here is the script that shows the bug. It is designed to compare the performance of weave blitz versus numeric for repeated adds of a 2D array
x + x x + x + x x + x + x + x ....and so on
On the second iteration through the loop (x+x+x), it issues a "repairing catalog by removing key" and recompiles the extensions -- it doesn't do this for any other of the repeated add lines in this loop. This happens repeatedly if you rerun the script, so the cache is being ignored. Unfortunately, I only see this on my G4 powerbook and not on my linux box, so it might be hard for others to use as a test script.
Tomorrow I'll try and update scipy weave on my powerbook from CVS and try it with and w/o your patch.
JDH
from __future__ import division import sys, time from Numeric import zeros, Float from MLab import rand import weave from pylab import subplot, plot, show, legend, xlabel, ylabel, title
shape = 200,200 x = rand(*shape)
def repeat_nadds(Nadds, Nevals, useWeave): """ Time the addition of i=2,Nadds arrays. Evaluate each expression Nevals times to produce accurate timing results. If useWeave is True, use weave to inline the addition, else use Numeric
return value is n,t where n is a list of the the number of arrays added and t is the average time it took to add the arrays """ results = [] for i in range(2,Nadds): s = 'result = %s' % '+'.join(['x']*i) print 'evaluating: %s with weave=%s' % (s,useWeave) tstart = time.time()
# only weave needs to predefine result array if useWeave: result= zeros(shape, typecode=Float) for j in range(Nevals): if useWeave: weave.blitz(s) else: exec(s) elapsed = (time.time()-tstart)/Nevals print '\tNadds=%d Elapsed=%1.2f' % (i, elapsed) results.append( (i, elapsed) ) return zip(*results)
Nadds = 7 Nevals = 20
# evaluate weave nw, tw = repeat_nadds(Nadds, Nevals, useWeave=True) # evaluate Numeric nn, tn = repeat_nadds(Nadds, Nevals, useWeave=False)
# plot weave versus Numeric ax = subplot(111) plot(nw, tw, 'go', nn, tn, 'bs') legend( ('Weave', 'Numeric') ) xlabel('num adds') ylabel('time (s)') title('Numeric vs weave; repeated adds') ax.set_xlim( (0, Nadds+1)) show()
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.net http://www.scipy.net/mailman/listinfo/scipy-dev
-- ------------------------------------- Manuel Metz Sternwarte der Universitaet Bonn Auf dem Huegel 71 (room 3.06) D - 53121 Bonn E-Mail: mmetz@astro.uni-bonn.de Phone: (+49) 228 / 73-3660 Fax: (+49) 228 / 73-3672 -------------------------------------

"John" == John Hunter <jdhunter@ace.bsd.uchicago.edu> writes:
"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes: [...] John> Tomorrow I'll try and update scipy weave on my powerbook from John> CVS and try it with and w/o your patch.
My time machine must be working overtime. This code poses no problems on my machine (Debian sarge) scipy.__version__ = '0.3.3_303.4601'. cheers, prabhu

"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
>> John Hunter wrote: >>> The weave demo code I was running at Los Alamos showed this >>> bug so it will be a good test case. I have it on my laptop at >>> home, and so can't send it now. Fernando, do you have a copy >>> of the roadshow directory on your laptop? Fernando> Actually, I just checked and that stuff never made it to Fernando> SVN, so I don't have it. Could you either commit it or Fernando> mail it to me so I can test and see if my fix also works Fernando> for your code? OK, just did a fresh checkout of scipy from CVS on my powerbook G4, and the bug was still present when running the test script. peds-mac054:~> uname -a Darwin peds-mac054.bsd.uchicago.edu 7.7.0 Darwin Kernel Version 7.7.0: Sun Nov 7 16:06:51 PST 2004; root:xnu/xnu-517.9.5.obj~1/RELEASE_PPC Power Macintosh powerpc peds-mac054:~> python -c 'import scipy; print scipy.__version__' numerix Numeric 24.0b2 0.3.3_309.4624 After applying your patch the bug disappeared, so it looks good from here. On my linux desktop peds-pc311:~/python/examples/weave/blitz> uname -a Linux peds-pc311 2.6.10-5-386 #1 Fri May 20 13:52:48 UTC 2005 i686 GNU/Linux peds-pc311:~/python/examples/weave/blitz> python -c 'import scipy; print scipy.__version__' 0.3.2 I am not seeing the bug. I just upgraded my linux box to scipy CVS peds-pc311:~/python/examples/weave/blitz> python -c 'import scipy; print scipy.__version__' numerix Numeric 24.0b2 0.3.3_309.4624 and still do not see the bug. Strange... JDH

John Hunter wrote:
"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
>> John Hunter wrote: >>> The weave demo code I was running at Los Alamos showed this >>> bug so it will be a good test case. I have it on my laptop at >>> home, and so can't send it now. Fernando, do you have a copy >>> of the roadshow directory on your laptop?
Fernando> Actually, I just checked and that stuff never made it to Fernando> SVN, so I don't have it. Could you either commit it or Fernando> mail it to me so I can test and see if my fix also works Fernando> for your code?
OK, just did a fresh checkout of scipy from CVS on my powerbook G4, and the bug was still present when running the test script.
That's because I hadn't committed anything yet :) Now that Prahbu clarified that he didn't expect a full-blown unittest for this thing, I just committed all the fixes. They also include better weave messages, and the headers problem correction. I'm attaching the testing script here, modified to rely only on scipy and not matplotlib. It also runs the weave tests twice, to try and show to the user whether there is really a problem or not. I'd appreciate if others could test this and let us know if the problems are really fixed or not (multi-platform feedback would be most welcome, I only have access to a linux system). Prabhu, feel free to stick this test script anywhere in the tree you think is a appropriate; I didn't commit it b/c I wasn't sure where to put it. All code is otherwise committed already. Back to real work now... Best, f ps - other devs: does anyone have any tips on how to fluidly set things up for working with scipy CVS which is changing frequently? I find it really painful to have to rebuild all of scipy when I make changes, but I don't know, short of writing makefiles which duplicate setup.py, how to avoid recompiling all the extension modules from scratch, which takes a long time. As far as I know, distutils is pretty dumb when it comes to make-like dependency analysis and as-needed-only recompilation. Any pointers on this front would be tremendously useful (if someone writes this up, it would be great to append it to the Developers.txt document in the sources. I can do that if nobody else does). I'm used to dealing with heavily-changing code only in ipython, matplotlib and mayavi, and those are mostly pure python, hence far easier to deal with with just some PYTHONPATH tricks. For matplotlib I rarely modify the extensions, so I just symlink the .so files by hand. But scipy is too complex for that approach, and I feel a bit lost right now. Pointers most welcome... #!/usr/bin/env python import sys, time, os, glob from scipy import rand,empty,Float import weave def repeat_nadds(Nadds, Nevals, x,verbose=False): """ Time the addition of i=2,Nadds arrays. Evaluate each expression Nevals times to produce accurate timing results. If useWeave is True, use weave to inline the addition, else use Numeric return value is n,t where n is a list of the the number of arrays added and t is the average time it took to add the arrays """ for i in range(2,Nadds): s = 'result = %s' % '+'.join(['x']*i) if verbose: print 'evaluating: %s with weave' % s # weave needs to predefine result array result= empty(x.shape, typecode=Float) for j in range(Nevals): weave.blitz(s) def main(): Nadds = 5 Nevals = 20 shape = 20,20 print """Weave recompilation bug test script\n""" weave_dir = weave.catalog.default_dir() weave_files = [f for f in os.listdir(weave_dir) if f.endswith('catalog') or f.endswith('.cpp') or f.endswith('.so') or f.endswith('.dll') ] nfiles = len(weave_files) if nfiles>0: print """*** ERROR *** In order for this script to properly test for the bug, you must first clear your weave compilation directory, which in your system is: %(weave_dir)s There are currently %(nfiles)s weave-specific files in that directory, so this script will exit. Please empty that directory first, and rerun this test.""" % locals() return # Otherwise we're OK, go with the test x = rand(*shape) print "First pass - weave should be compiling code..." repeat_nadds(Nadds, Nevals, x,verbose=True) print "\nSecond pass - weave should be NOT compiling anything this time..." repeat_nadds(Nadds, Nevals, x,verbose=False) print "If there was any weave compilation-related output above, it's a bug." if __name__=="__main__": main()

"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
Fernando> Prabhu, feel free to stick this test script anywhere in Fernando> the tree you think is a appropriate; I didn't commit it Fernando> b/c I wasn't sure where to put it. All code is Fernando> otherwise committed already. Thanks, I think this could just go into tests/, unless someone has serious objections. cheers, prabhu

Prabhu Ramachandran wrote:
"Fernando" == Fernando Perez <Fernando.Perez@colorado.edu> writes:
Fernando> Prabhu, feel free to stick this test script anywhere in Fernando> the tree you think is a appropriate; I didn't commit it Fernando> b/c I wasn't sure where to put it. All code is Fernando> otherwise committed already.
Thanks, I think this could just go into tests/, unless someone has serious objections.
Fine with me. Given how much people love to deal with weave's internals, I doubt you'll get any objections, let alone serious ones :) It would still be good to hear whether the problems did go away for Win32 users. J. Hunter mentioned improvements on OSX, and I tested linux, but I have no way to check win32. I don't see why it shouldn't work, since none of that code was platform dependent. But to be honest, I never quite figured out the exact mechanism for the bug (in particular, why it got triggered on some cases and not others). So I don't really know what to expect from that code. Cheers, f
participants (4)
-
Fernando Perez
-
John Hunter
-
Manuel Metz
-
Prabhu Ramachandran