Reminder: code freeze for bet at the end of the WE
hi, Just a friendly reminder that I will close the trunk for 1.3.0 at the end of 15th March (I will more likely do it at the end of Monday Japan time which roughly corresponds to 15th March midnight Pacific time), cheers, David
On Sat, Mar 14, 2009 at 12:01 PM, David Cournapeau
hi,
Just a friendly reminder that I will close the trunk for 1.3.0 at the end of 15th March (I will more likely do it at the end of Monday Japan time which roughly corresponds to 15th March midnight Pacific time),
cheers,
David
Any chance for tickets 921 and 923. I would like to remove some test failures in the random numbers in scipy.stats.distributions. Josef
On Sat, Mar 14, 2009 at 10:57 AM,
On Sat, Mar 14, 2009 at 12:01 PM, David Cournapeau
wrote: hi,
Just a friendly reminder that I will close the trunk for 1.3.0 at the end of 15th March (I will more likely do it at the end of Monday Japan time which roughly corresponds to 15th March midnight Pacific time),
The fixes look small and I'd like them to go in. Can you put together some short tests for these fixes? Would it help if you had commit privileges in Numpy? Chuck
On Sun, Mar 15, 2009 at 2:40 AM, Charles R Harris
The fixes look small and I'd like them to go in. Can you put together some short tests for these fixes? Would it help if you had commit privileges in Numpy?
Yes, I was about to suggest giving Josef commit access to numpy, I unfortunately won't have much time to do anything but release tasks in the next few days, including review. If someone else (you :) ) can review the changes, before they go in, then there is no reason why they can't go in - assuming they come in very soon, David
On Sat, Mar 14, 2009 at 11:52 AM, David Cournapeau
On Sun, Mar 15, 2009 at 2:40 AM, Charles R Harris
wrote: The fixes look small and I'd like them to go in. Can you put together
some
short tests for these fixes? Would it help if you had commit privileges in Numpy?
Yes, I was about to suggest giving Josef commit access to numpy, I unfortunately won't have much time to do anything but release tasks in the next few days, including review. If someone else (you :) ) can review the changes, before they go in, then there is no reason why they can't go in - assuming they come in very soon,
The fixes are both one-liners. Testing... I wonder if tests for things like random distributions and computational accuracy of special functions shouldn't be separate scripts. They can be large file, ala special functions, or time consuming and it doesn't help that parts are needed by both scipy and numpy. I don't feel competent to say that the fixes are correct, I'll trust Josef in that regard. Chuck
On Sat, Mar 14, 2009 at 1:52 PM, David Cournapeau
On Sun, Mar 15, 2009 at 2:40 AM, Charles R Harris
wrote: The fixes look small and I'd like them to go in. Can you put together some short tests for these fixes? Would it help if you had commit privileges in Numpy?
Yes, I was about to suggest giving Josef commit access to numpy, I unfortunately won't have much time to do anything but release tasks in the next few days, including review. If someone else (you :) ) can review the changes, before they go in, then there is no reason why they can't go in - assuming they come in very soon,
David
The correctness of the random numbers are tested in scipy.stats. They are not tested in np.random.tests. Currently, I have the test for logser disabled because it always fails, for hypergeometric, I picked parameters for which the random numbers are correct. Once the bugs are fixed, I can add or re-enable the tests for the current failures. Here are some tests, that should fail with the current trunk and pass after the fix. I don't have an unpatched version of numpy available right now, but these are the cases that initially showed the bugs. Can you verify that they fail on current or recent trunk? They don't fail on my patched version. But it has been some time ago that I did this and I would need to check the details again if these tests don't fail on the current trunk. {{{ import numpy as np assert np.all(np.random.hypergeometric(3,18,11,size=10) < 4) assert np.all(np.random.hypergeometric(18,3,11,size=10) > 0) pr = 0.8 N = 100000 rvsn = np.random.logseries(pr,size=N) # these two frequency counts should be close to theoretical numbers with this large sample assert np.sum(rvsn==1) / float(N) > 0.45 # theoretical: 0.49706795 assert np.sum(rvsn==1) / float(N) < 0.23 # theoretical: 0.19882718 }}} About commit access: it would be convenient to have it, but not necessary since there are only a few things that I can contribute to numpy directly. Josef
On Sat, Mar 14, 2009 at 12:14 PM,
On Sat, Mar 14, 2009 at 1:52 PM, David Cournapeau
wrote: On Sun, Mar 15, 2009 at 2:40 AM, Charles R Harris
wrote: The fixes look small and I'd like them to go in. Can you put together
some
short tests for these fixes? Would it help if you had commit privileges in Numpy?
Yes, I was about to suggest giving Josef commit access to numpy, I unfortunately won't have much time to do anything but release tasks in the next few days, including review. If someone else (you :) ) can review the changes, before they go in, then there is no reason why they can't go in - assuming they come in very soon,
David
The correctness of the random numbers are tested in scipy.stats. They are not tested in np.random.tests. Currently, I have the test for logser disabled because it always fails, for hypergeometric, I picked parameters for which the random numbers are correct. Once the bugs are fixed, I can add or re-enable the tests for the current failures.
Here are some tests, that should fail with the current trunk and pass after the fix. I don't have an unpatched version of numpy available right now, but these are the cases that initially showed the bugs. Can you verify that they fail on current or recent trunk? They don't fail on my patched version. But it has been some time ago that I did this and I would need to check the details again if these tests don't fail on the current trunk.
{{{ import numpy as np
assert np.all(np.random.hypergeometric(3,18,11,size=10) < 4) assert np.all(np.random.hypergeometric(18,3,11,size=10) > 0)
pr = 0.8 N = 100000 rvsn = np.random.logseries(pr,size=N) # these two frequency counts should be close to theoretical numbers with this large sample assert np.sum(rvsn==1) / float(N) > 0.45 # theoretical: 0.49706795 assert np.sum(rvsn==1) / float(N) < 0.23 # theoretical: 0.19882718 }}}
I can verify that these currently fail on my machine. I'll make regression tests out of them and then commit the fixes. Chuck
Hi Josef,
On Sat, Mar 14, 2009 at 12:14 PM,
{{{ import numpy as np
assert np.all(np.random.hypergeometric(3,18,11,size=10) < 4) assert np.all(np.random.hypergeometric(18,3,11,size=10) > 0)
pr = 0.8 N = 100000 rvsn = np.random.logseries(pr,size=N) # these two frequency counts should be close to theoretical numbers with this large sample assert np.sum(rvsn==1) / float(N) > 0.45 # theoretical: 0.49706795 assert np.sum(rvsn==1) / float(N) < 0.23 # theoretical: 0.19882718 }}}
I just see one frequency count here. Do you mean that the frequency count should fall in that range with some probability? Chuck
On Sat, Mar 14, 2009 at 3:11 PM, Charles R Harris
Hi Josef,
On Sat, Mar 14, 2009 at 12:14 PM,
wrote: <snip> {{{ import numpy as np
assert np.all(np.random.hypergeometric(3,18,11,size=10) < 4) assert np.all(np.random.hypergeometric(18,3,11,size=10) > 0)
pr = 0.8 N = 100000 rvsn = np.random.logseries(pr,size=N) # these two frequency counts should be close to theoretical numbers with this large sample
Sorry, cut and paste error, the second case is k=2 for k=1 the unpatched version undersamples, for k=2 the unpatched version oversamples, that's the reason for the inequalities; the bugfix should reallocate them correctly. for several runs with N = 100000, I get with the patched version
rvsn = np.random.logseries(pr,size=N); np.sum(rvsn==1) / float(N) in range: 0.4951, 0.4984 # unpatched version is too small
rvsn = np.random.logseries(pr,size=N); np.sum(rvsn==2) / float(N) in range: 0.1980, 0.2001 # unpatched version is too large
with constraints a bit more tight, it should be:
assert np.sum(rvsn==1) / float(N) > 0.49 # theoretical: 0.49706795 assert np.sum(rvsn==2) / float(N) < 0.205 # theoretical: 0.19882718
Josef
}}}
I just see one frequency count here. Do you mean that the frequency count should fall in that range with some probability?
Chuck
On Sat, Mar 14, 2009 at 1:37 PM,
On Sat, Mar 14, 2009 at 3:11 PM, Charles R Harris
wrote: Hi Josef,
On Sat, Mar 14, 2009 at 12:14 PM,
wrote: <snip> {{{ import numpy as np
assert np.all(np.random.hypergeometric(3,18,11,size=10) < 4) assert np.all(np.random.hypergeometric(18,3,11,size=10) > 0)
pr = 0.8 N = 100000 rvsn = np.random.logseries(pr,size=N) # these two frequency counts should be close to theoretical numbers with this large sample
Sorry, cut and paste error, the second case is k=2 for k=1 the unpatched version undersamples, for k=2 the unpatched version oversamples, that's the reason for the inequalities; the bugfix should reallocate them correctly.
for several runs with N = 100000, I get with the patched version
rvsn = np.random.logseries(pr,size=N); np.sum(rvsn==1) / float(N) in range: 0.4951, 0.4984 # unpatched version is too small
rvsn = np.random.logseries(pr,size=N); np.sum(rvsn==2) / float(N) in range: 0.1980, 0.2001 # unpatched version is too large
with constraints a bit more tight, it should be:
assert np.sum(rvsn==1) / float(N) > 0.49 # theoretical: 0.49706795 assert np.sum(rvsn==2) / float(N) < 0.205 # theoretical: 0.19882718
OK. One more question: how often do the tests fail? I want to include a note to repeat testing if the test fails. Chuck
On Sat, Mar 14, 2009 at 1:52 PM, Charles R Harris wrote: On Sat, Mar 14, 2009 at 1:37 PM, On Sat, Mar 14, 2009 at 3:11 PM, Charles R Harris
Hi Josef, On Sat, Mar 14, 2009 at 12:14 PM, {{{
import numpy as np assert np.all(np.random.hypergeometric(3,18,11,size=10) < 4)
assert np.all(np.random.hypergeometric(18,3,11,size=10) > 0) pr = 0.8
N = 100000
rvsn = np.random.logseries(pr,size=N)
# these two frequency counts should be close to theoretical numbers
with this large sample Sorry, cut and paste error, the second case is k=2
for k=1 the unpatched version undersamples, for k=2 the unpatched
version oversamples, that's the reason for the inequalities; the
bugfix should reallocate them correctly. for several runs with N = 100000, I get with the patched version rvsn = np.random.logseries(pr,size=N); np.sum(rvsn==1) / float(N)
in range: 0.4951, 0.4984 # unpatched version is too small rvsn = np.random.logseries(pr,size=N); np.sum(rvsn==2) / float(N)
in range: 0.1980, 0.2001 # unpatched version is too large with constraints a bit more tight, it should be: assert np.sum(rvsn==1) / float(N) > 0.49 # theoretical: 0.49706795
assert np.sum(rvsn==2) / float(N) < 0.205 # theoretical: 0.19882718 OK. One more question: how often do the tests fail? I want to include a
note to repeat testing if the test fails. Mind, I don't want to test the distribution in detail, I just want something
that fails with the current code and passes with the new.
Chuck
On Sat, Mar 14, 2009 at 1:37 PM,
wrote:
OK. One more question: how often do the tests fail? I want to include a note to repeat testing if the test fails.
I don't like this. I think the prngs should use fixed seeds known to pass the test. Depending on confidence intervals in the units tests is really, really bad style. Tests should be deterministic. S.M.
On Sat, Mar 14, 2009 at 2:22 PM, Sturla Molden
On Sat, Mar 14, 2009 at 1:37 PM,
wrote: OK. One more question: how often do the tests fail? I want to include a note to repeat testing if the test fails.
I don't like this. I think the prngs should use fixed seeds known to pass the test. Depending on confidence intervals in the units tests is really, really bad style. Tests should be deterministic.
Good idea... Chuck
On Sat, Mar 14, 2009 at 4:22 PM, Sturla Molden
On Sat, Mar 14, 2009 at 1:37 PM,
wrote: OK. One more question: how often do the tests fail? I want to include a note to repeat testing if the test fails.
I don't like this. I think the prngs should use fixed seeds known to pass the test. Depending on confidence intervals in the units tests is really, really bad style. Tests should be deterministic.
S.M.
The hypergeometric tests are on the support of the distribution and should never fail. And the outcome is not random. The test of logser with N = 100000 also should be pretty exact and fail only with very low probability in the patched version. But again this is testet in scipy.stats. I think Sturlas idea to find a random seed that differentiates before and after will be better for numpy, and using only a small sample size e.g. N=1000, since it's pretty fast. But since I don't have an unpatched numpy version available right now, I cannot do this. Josef
On Sat, Mar 14, 2009 at 2:28 PM,
On Sat, Mar 14, 2009 at 4:22 PM, Sturla Molden
wrote: On Sat, Mar 14, 2009 at 1:37 PM,
wrote: OK. One more question: how often do the tests fail? I want to include a note to repeat testing if the test fails.
I don't like this. I think the prngs should use fixed seeds known to pass the test. Depending on confidence intervals in the units tests is really, really bad style. Tests should be deterministic.
S.M.
The hypergeometric tests are on the support of the distribution and should never fail. And the outcome is not random.
The test of logser with N = 100000 also should be pretty exact and fail only with very low probability in the patched version. But again this is testet in scipy.stats.
I think Sturlas idea to find a random seed that differentiates before and after will be better for numpy, and using only a small sample size e.g. N=1000, since it's pretty fast. But since I don't have an unpatched numpy version available right now, I cannot do this.
Done. Thanks for the fixes and tests. Chuck
On Sat, Mar 14, 2009 at 4:58 PM, Charles R Harris
On Sat, Mar 14, 2009 at 2:28 PM,
wrote: On Sat, Mar 14, 2009 at 4:22 PM, Sturla Molden
wrote: On Sat, Mar 14, 2009 at 1:37 PM,
wrote: OK. One more question: how often do the tests fail? I want to include a note to repeat testing if the test fails.
I don't like this. I think the prngs should use fixed seeds known to pass the test. Depending on confidence intervals in the units tests is really, really bad style. Tests should be deterministic.
S.M.
The hypergeometric tests are on the support of the distribution and should never fail. And the outcome is not random.
The test of logser with N = 100000 also should be pretty exact and fail only with very low probability in the patched version. But again this is testet in scipy.stats.
I think Sturlas idea to find a random seed that differentiates before and after will be better for numpy, and using only a small sample size e.g. N=1000, since it's pretty fast. But since I don't have an unpatched numpy version available right now, I cannot do this.
Done. Thanks for the fixes and tests.
Chuck
Thanks for taking care of this. I will run my scipy.stats.distribution test over it before 1.3 is released and enable the tests in scipy trunk after the release. Josef
Will memmap be fixed to use offsets correctly before 1.3?
hi,
Just a friendly reminder that I will close the trunk for 1.3.0 at the end of 15th March (I will more likely do it at the end of Monday Japan time which roughly corresponds to 15th March midnight Pacific time),
cheers,
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Will memmap be fixed to use offsets correctly before 1.3?
I posted this to scipy-dev (possibly wrong list) on March 9, so I'll repeat it here: In Python 2.6, mmap has a offset keyword. NumPy's memmap should use this to allow big files to be memory mapped on 32 bit systems. Only a minor change is required: if float(sys.version[:3]) > 2.5: bytes = bytes - offset mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=offset) self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=0, order=order) else: mm = mmap.mmap(fid.fileno(), bytes, access=acc) self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=offset, order=order) Instead of just: mm = mmap.mmap(fid.fileno(), bytes, access=acc) self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=offset, order=order) Reagards, Sturla Molden
Hi Sturla,
On Sat, Mar 14, 2009 at 12:23 PM, Sturla Molden
Will memmap be fixed to use offsets correctly before 1.3?
I posted this to scipy-dev (possibly wrong list) on March 9, so I'll repeat it here: In Python 2.6, mmap has a offset keyword. NumPy's memmap should use this to allow big files to be memory mapped on 32 bit systems. Only a minor change is required:
if float(sys.version[:3]) > 2.5:
bytes = bytes - offset
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=offset)
self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=0, order=order)
else:
mm = mmap.mmap(fid.fileno(), bytes, access=acc)
self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=offset, order=order)
Instead of just:
mm = mmap.mmap(fid.fileno(), bytes, access=acc)
self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=offset, order=order)
Can you open a ticket for this? Chuck
1) I have noticed that fftpack_litemodule.c does not release the GIL around calls to functions in fftpack.c. I cannot se any obvious reason for this. As far as I can tell, the functions in fftpack.c are re-entrant. 2) If fftpack_lite did release the GIL, it would allow functions in numpy.fft to use multithreading for multiple FFTs in parallel (threading.Thread are ok, not special compilation needed). 3) Is there any reason numpy.fft does not have dct? If not, I'd suggest addition of numpy.fft.dct and numpy.fft.idct. 4) Regarding ticket #400: Cython now makes this easy. NumPy's FFTs should be exposed to C extesions without calling back to Python. Can I open a ticket for this and take care of it? At least 1, 2 and 4 should only take me an hour or so to write, so it might even be ready for 1.3.0. Sturla Molden
On Sat, Mar 14, 2009 at 2:16 PM, Sturla Molden
1) I have noticed that fftpack_litemodule.c does not release the GIL around calls to functions in fftpack.c. I cannot se any obvious reason for this. As far as I can tell, the functions in fftpack.c are re-entrant.
2) If fftpack_lite did release the GIL, it would allow functions in numpy.fft to use multithreading for multiple FFTs in parallel (threading.Thread are ok, not special compilation needed).
3) Is there any reason numpy.fft does not have dct? If not, I'd suggest addition of numpy.fft.dct and numpy.fft.idct.
4) Regarding ticket #400: Cython now makes this easy. NumPy's FFTs should be exposed to C extesions without calling back to Python.
Can I open a ticket for this and take care of it? At least 1, 2 and 4 should only take me an hour or so to write, so it might even be ready for 1.3.0.
Give it a shot. Note that the fft transforms also use int instead of intp, which limits the maximum transform size to 32 bits. Fixing that is somewhere on my todo list but I would be happy to leave it to you ;) Although I expect transforms > 2GB aren't all that common. Chuck
On Sat, Mar 14, 2009 at 2:24 PM, Charles R Harris wrote: On Sat, Mar 14, 2009 at 2:16 PM, Sturla Molden 1) I have noticed that fftpack_litemodule.c does not release the GIL
around calls to functions in fftpack.c. I cannot se any obvious reason for
this. As far as I can tell, the functions in fftpack.c are re-entrant. 2) If fftpack_lite did release the GIL, it would allow functions in
numpy.fft to use multithreading for multiple FFTs in parallel
(threading.Thread are ok, not special compilation needed). 3) Is there any reason numpy.fft does not have dct? If not, I'd suggest
addition of numpy.fft.dct and numpy.fft.idct. 4) Regarding ticket #400: Cython now makes this easy. NumPy's FFTs should
be exposed to C extesions without calling back to Python. Can I open a ticket for this and take care of it? At least 1, 2 and 4
should only take me an hour or so to write, so it might even be ready for
1.3.0. Give it a shot. Note that the fft transforms also use int instead of intp,
which limits the maximum transform size to 32 bits. Fixing that is somewhere
on my todo list but I would be happy to leave it to you ;) Although I expect
transforms > 2GB aren't all that common. On the reentrant bit, IIRC fftpack builds a table of sin/cos. It might be
worth checking/making that thread safe.
Chuck
On Sat, Mar 14, 2009 at 2:24 PM, Charles R Harris
Give it a shot. Note that the fft transforms also use int instead of intp, which limits the maximum transform size to 32 bits. Fixing that is somewhere on my todo list but I would be happy to leave it to you ;) Although I expect transforms > 2GB aren't all that common.
On the reentrant bit, IIRC fftpack builds a table of sin/cos. It might be worth checking/making that thread safe.
Thanks, I'll take a careful look at it. Sturla
On Sat, Mar 14, 2009 at 3:58 PM, Sturla Molden
On Sat, Mar 14, 2009 at 2:24 PM, Charles R Harris
Give it a shot. Note that the fft transforms also use int instead of intp, which limits the maximum transform size to 32 bits. Fixing that is somewhere on my todo list but I would be happy to leave it to you ;) Although I expect transforms > 2GB aren't all that common.
On the reentrant bit, IIRC fftpack builds a table of sin/cos. It might be worth checking/making that thread safe.
Thanks, I'll take a careful look at it.
There is also a ticket (#579) to add an implementation of the Bluestein algorithm for doing prime order fft's. This could also be used for zoom type fft's. There is lots of fft stuff to be done. I wonder if some of it shouldn't go in Scipy? I think David added some dcts to Scipy. Chuck
On Sat, Mar 14, 2009 at 3:58 PM, Sturla Molden
wrote:
There is also a ticket (#579) to add an implementation of the Bluestein algorithm for doing prime order fft's. This could also be used for zoom type fft's. There is lots of fft stuff to be done. I wonder if some of it shouldn't go in Scipy? I think David added some dcts to Scipy.
I am not changing or adding algorithms for now. This is just to prevent NumPy from locking up the interpreter while doing FFTs. The loops that are worth multithreading are done in C in fftpack_litemodule.c, not in Python in fftpack.py. I have added OpenMP pragmas around them. When NumPy gets a build process that supports OpenMP, they will execute in parallel. On GCC 4.4 is means compiling with -fopenmp and linking -lgomp -lpthread (that goes for mingw/cygwin as well). The init function seems to be thread safe. cffti and rffti work on arrays created in the callers (fftpack_cffti and fftpack_rffti), no global objects are touched. I'm attaching a version of fftpack_litemodule.c that fixes most of what I mentioned. Sturla Molden
On Sat, Mar 14, 2009 at 7:02 PM, Sturla Molden
On Sat, Mar 14, 2009 at 3:58 PM, Sturla Molden
wrote: There is also a ticket (#579) to add an implementation of the Bluestein algorithm for doing prime order fft's. This could also be used for zoom type fft's. There is lots of fft stuff to be done. I wonder if some of it shouldn't go in Scipy? I think David added some dcts to Scipy.
I am not changing or adding algorithms for now. This is just to prevent NumPy from locking up the interpreter while doing FFTs.
Well, I was hoping to get you sucked into doing some work here ;)
The loops that are worth multithreading are done in C in fftpack_litemodule.c, not in Python in fftpack.py. I have added OpenMP pragmas around them. When NumPy gets a build process that supports OpenMP, they will execute in parallel. On GCC 4.4 is means compiling with -fopenmp and linking -lgomp -lpthread (that goes for mingw/cygwin as well).
The init function seems to be thread safe. cffti and rffti work on arrays created in the callers (fftpack_cffti and fftpack_rffti), no global objects are touched.
I'm attaching a version of fftpack_litemodule.c that fixes most of what I mentioned.
Can you put it somewhere for review? I don't think this should go into 1.3 at this late date but 1.4 is a good chance. Hopefully we will get the next release out a bit faster than this one. Chuck
Give it a shot. Note that the fft transforms also use int instead of intp, which limits the maximum transform size to 32 bits. Fixing that is somewhere on my todo list but I would be happy to leave it to you ;) Although I expect transforms > 2GB aren't all that common.
By the way... When looking at fftpack.c there are two things that would likely improve the performance. 1) If we used ISO C (aka C99), arrays could be restricted, thus allowing more aggressive optimization. Now the compiler has to assume aliasing between function arguments. But as the C code is translated from Fortran, this is not the case. 2) In C, indexing arrays with unsigned integers are much more efficient (cf. AMDs optimization guide). I think the use of signed integers as array indices are inherited from Fortran77 FFTPACK. We should probably index the arrays with unsigned longs. Sturla Molden
On Sat, Mar 14, 2009 at 7:23 PM, Sturla Molden
Give it a shot. Note that the fft transforms also use int instead of intp, which limits the maximum transform size to 32 bits. Fixing that is somewhere on my todo list but I would be happy to leave it to you ;) Although I expect transforms > 2GB aren't all that common.
By the way... When looking at fftpack.c there are two things that would likely improve the performance.
1) If we used ISO C (aka C99), arrays could be restricted, thus allowing more aggressive optimization. Now the compiler has to assume aliasing between function arguments. But as the C code is translated from Fortran, this is not the case.
We can't count on C99 at this point. Maybe David will add something so we can use c99 when it is available.
2) In C, indexing arrays with unsigned integers are much more efficient (cf. AMDs optimization guide). I think the use of signed integers as array indices are inherited from Fortran77 FFTPACK. We should probably index the arrays with unsigned longs.
I don't have a problem with this, although I not sure what npy type is appropriate without looking. Were you thinking of size_t? I was tempted by that. But why is it more efficient? I haven't seen any special instructions at the assembly level, so unless there is some sort of global optimization that isn't obvious I don't know where the advantage is. I always figured that a really good optimizer should derive the FFT if you just give it the DFT code ;) Chuck
On Sat, Mar 14, 2009 at 7:23 PM, Sturla Molden
wrote:
We can't count on C99 at this point. Maybe David will add something so we can use c99 when it is available.
Ok, but most GNU compilers has a __restrict__ extension for C89 and C++ that we could use. And MSVC has a compiler pragma in VS2003 and a __restrict extension in VS2005 later versions. So we could define a mscro RESTRICT to be restrict in ISO C99, __restrict__ in GCC 3 and 4, __restrict in recent versions of MSVC, and nothing elsewhere.
I don't have a problem with this, although I not sure what npy type is appropriate without looking. Were you thinking of size_t? I was tempted by that. But why is it more efficient? I haven't seen any special instructions at the assembly level, so unless there is some sort of global optimization that isn't obvious I don't know where the advantage is.
I may be that my memory serves med badly. I thought I read it here, but it does not show examples of different assembly code being generated. So I think I'll just leave it for now and experiment with this later. http://support.amd.com/us/Processor_TechDocs/22007.pdf Is an npy_intp 64 bit on 64 bit systems? Sturla Molden
On Sat, Mar 14, 2009 at 8:26 PM, Sturla Molden
On Sat, Mar 14, 2009 at 7:23 PM, Sturla Molden
wrote: We can't count on C99 at this point. Maybe David will add something so we can use c99 when it is available.
Ok, but most GNU compilers has a __restrict__ extension for C89 and C++ that we could use. And MSVC has a compiler pragma in VS2003 and a __restrict extension in VS2005 later versions. So we could define a mscro RESTRICT to be restrict in ISO C99, __restrict__ in GCC 3 and 4, __restrict in recent versions of MSVC, and nothing elsewhere.
I don't have a problem with this, although I not sure what npy type is appropriate without looking. Were you thinking of size_t? I was tempted by that. But why is it more efficient? I haven't seen any special instructions at the assembly level, so unless there is some sort of global optimization that isn't obvious I don't know where the advantage is.
I may be that my memory serves med badly. I thought I read it here, but it does not show examples of different assembly code being generated. So I think I'll just leave it for now and experiment with this later.
http://support.amd.com/us/Processor_TechDocs/22007.pdf
Is an npy_intp 64 bit on 64 bit systems?
Yes, it is the same size as a pointer, but it is signed... Chuck
Ok, but most GNU compilers has a __restrict__ extension for C89 and C++ that we could use. And MSVC has a compiler pragma in VS2003 and a __restrict extension in VS2005 later versions. So we could define a mscro RESTRICT to be restrict in ISO C99, __restrict__ in GCC 3 and 4, __restrict in recent versions of MSVC, and nothing elsewhere.
I know it's ugly, but something like this: #define RESTRICT #define INLINE /*use GNU extension if possible*/ #ifdef __GNUC__ #if (__GNUC__ >= 3) #undef RESTRICT #undef INLINE #define RESTRICT __restrict__ #define INLINE __inline__ #endif #endif /* use MSVC extensions if possible */ #ifdef MSVC #if (MSVC_VER >= 1400) #define RESTRICT __restrict #define INLINE inline #endif #endif #ifdef __cplusplus extern "C" { #undef INLINE #define INLINE inline #else /* use C99 if possible */ #if (__STDC_VERSION__ >= 199901L) #undef RESTRICT #undef INLINE #define RESTRICT restrict #define INLINE inline #endif #endif #ifdef DOUBLE typedef double Treal #else typedef float Treal #endif typedef Treal *RESTRICT Vreal /* V as in "vector" */ S.M.
On Sat, Mar 14, 2009 at 8:52 PM, Sturla Molden
Ok, but most GNU compilers has a __restrict__ extension for C89 and C++ that we could use. And MSVC has a compiler pragma in VS2003 and a __restrict extension in VS2005 later versions. So we could define a mscro RESTRICT to be restrict in ISO C99, __restrict__ in GCC 3 and 4, __restrict in recent versions of MSVC, and nothing elsewhere.
I know it's ugly, but something like this:
Can't help but be ugly when dealing with all the compilers out there.
#define RESTRICT #define INLINE /*use GNU extension if possible*/ #ifdef __GNUC__ #if (__GNUC__ >= 3) #undef RESTRICT #undef INLINE #define RESTRICT __restrict__ #define INLINE __inline__ #endif #endif /* use MSVC extensions if possible */ #ifdef MSVC #if (MSVC_VER >= 1400) #define RESTRICT __restrict #define INLINE inline #endif #endif
I think MSVC uses _inline
#ifdef __cplusplus extern "C" { #undef INLINE #define INLINE inline #else /* use C99 if possible */ #if (__STDC_VERSION__ >= 199901L) #undef RESTRICT #undef INLINE #define RESTRICT restrict #define INLINE inline #endif #endif
What does this last bit do? We implicitly assume that ieee floating point is available.
#ifdef DOUBLE typedef double Treal #else typedef float Treal #endif typedef Treal *RESTRICT Vreal /* V as in "vector" */
I'm not sure about the names. I would prefer to keep the declarations along the lines of double * NPY_RESTRICT ptmp; As it is easier to read and understand without going to the macro definition. Note that David has a quite involved check for the inline keyword implementation and I expect he would want to do the same for the restrict keyword. I think using lots of #if defined(xxx) might be easier but I leave that stuff alone. It's David's headache. Chuck
I think MSVC uses _inline
No, MSVC uses a double underscore. That is, __restrict for variable names and __declspec(restrict) for function return values.
#if (__STDC_VERSION__ >= 199901L) #undef RESTRICT #undef INLINE #define RESTRICT restrict #define INLINE inline #endif #endif
What does this last bit do? We implicitly assume that ieee floating point is available.
It uses the restrict keyword in C99. The last test will pass on any C99 compiler.
#ifdef DOUBLE typedef double Treal #else typedef float Treal #endif typedef Treal *RESTRICT Vreal /* V as in "vector" */
I'm not sure about the names. I would prefer to keep the declarations along the lines of
double * NPY_RESTRICT ptmp;
Yes, but then I would have to go in and edit all function bodies in fftpack.c as well. fftpack.c currently uses "Treal array[]" as naming convention, so I just changed that to "Vreal array". I changed all ints to long for 64 bit support. Well, it compiles and runs ok on my computer now. I'll open a ticket for the FFT. I'll attach the C files to the ticket. Sturla Molden
On Sat, Mar 14, 2009 at 10:44 PM, Sturla Molden
I think MSVC uses _inline
No, MSVC uses a double underscore. That is, __restrict for variable names and __declspec(restrict) for function return values.
Yes, but MSVC uses _inline for inline.
#if (__STDC_VERSION__ >= 199901L) #undef RESTRICT #undef INLINE #define RESTRICT restrict #define INLINE inline #endif #endif
What does this last bit do? We implicitly assume that ieee floating point is available.
It uses the restrict keyword in C99. The last test will pass on any C99 compiler.
#ifdef DOUBLE typedef double Treal #else typedef float Treal #endif typedef Treal *RESTRICT Vreal /* V as in "vector" */
I'm not sure about the names. I would prefer to keep the declarations along the lines of
double * NPY_RESTRICT ptmp;
So use a local define.
Yes, but then I would have to go in and edit all function bodies in fftpack.c as well. fftpack.c currently uses "Treal array[]" as naming convention, so I just changed that to "Vreal array".
I changed all ints to long for 64 bit support.
Long is 32 bits on 64 bit windows. You need long long there. That's why npy_intp is preferred. Chuck
On Sun, Mar 15, 2009 at 12:46 PM, Charles R Harris
As it is easier to read and understand without going to the macro definition. Note that David has a quite involved check for the inline keyword implementation and I expect he would want to do the same for the restrict keyword. I think using lots of #if defined(xxx) might be easier
It is easier but less maintainable and less portable. With checks, we can deal with new platforms more easily. And it is more robust. Things depending on the platform should be the exception, not the norm. cheers, David
On Sat, Mar 14, 2009 at 8:26 PM, Sturla Molden
On Sat, Mar 14, 2009 at 7:23 PM, Sturla Molden
wrote: We can't count on C99 at this point. Maybe David will add something so we can use c99 when it is available.
Ok, but most GNU compilers has a __restrict__ extension for C89 and C++ that we could use. And MSVC has a compiler pragma in VS2003 and a __restrict extension in VS2005 later versions. So we could define a mscro RESTRICT to be restrict in ISO C99, __restrict__ in GCC 3 and 4, __restrict in recent versions of MSVC, and nothing elsewhere.
I don't have a problem with this, although I not sure what npy type is appropriate without looking. Were you thinking of size_t? I was tempted by that. But why is it more efficient? I haven't seen any special instructions at the assembly level, so unless there is some sort of global optimization that isn't obvious I don't know where the advantage is.
I may be that my memory serves med badly. I thought I read it here, but it does not show examples of different assembly code being generated. So I think I'll just leave it for now and experiment with this later.
I suspect the biggest gains can be made from careful attention to cache issues. I had a prototype block based fft -- using a different algorithm than the usual -- that outperformed fftw at that time. Chuck
On Sun, Mar 15, 2009 at 5:16 AM, Sturla Molden
1) I have noticed that fftpack_litemodule.c does not release the GIL around calls to functions in fftpack.c. I cannot se any obvious reason for this. As far as I can tell, the functions in fftpack.c are re-entrant.
2) If fftpack_lite did release the GIL, it would allow functions in numpy.fft to use multithreading for multiple FFTs in parallel (threading.Thread are ok, not special compilation needed).
Both are fines to modify for 1.3.
3) Is there any reason numpy.fft does not have dct? If not, I'd suggest addition of numpy.fft.dct and numpy.fft.idct.
numpy.fft is only here for compatibility with older array packages AFAIK. So we should not add new features. I would much prefer to see those kind of things in scipy.fftpack (which already have DCT I, II and III). Adding dct to numpy.fft will only add to the confusion, I think. There is already enough duplication between numpy and scipy, let's not add more.
4) Regarding ticket #400: Cython now makes this easy. NumPy's FFTs should be exposed to C extesions without calling back to Python.
Agreed - that's one area where we could much more, but this has to be done carefully. I am a bit worried of doing this kind of things at the last minute. 1, 2 are OK for the trunk, but not 4 (because it impacts public API), IMO. cheers, David
1) I have noticed that fftpack_litemodule.c does not release the GIL around calls to functions in fftpack.c. I cannot se any obvious reason for this. As far as I can tell, the functions in fftpack.c are re-entrant.
2) If fftpack_lite did release the GIL, it would allow functions in numpy.fft to use multithreading for multiple FFTs in parallel (threading.Thread are ok, not special compilation needed).
Both are fines to modify for 1.3.
There is a version of fftpack_litemodule.c, fftpack.c and fftpack.h that does this attached to ticket #1055. The two important changes are releasing the GIL and using npy_intp for 64 bit support. Minor changes: There is a restrict qualifier in fftpack.c. If it is not compiled with C99, it tries to use similar GNU or MS extensions. There is some OpenMP pragmas in ftpack_litemodule.c. If you don't compile with OpenMP support, they do nothing. If you do compile with OpenMP, they will make certain FFTs run in parallel. I can comment them out if you prefer. Sturla Molden
Sturla Molden wrote:
There is a version of fftpack_litemodule.c, fftpack.c and fftpack.h that does this attached to ticket #1055. The two important changes are releasing the GIL and using npy_intp for 64 bit support.
Would it be possible to make the changes as a patch (svn diff) - this makes things easier to review.
Minor changes:
There is a restrict qualifier in fftpack.c. If it is not compiled with C99, it tries to use similar GNU or MS extensions.
There is some OpenMP pragmas in ftpack_litemodule.c. If you don't compile with OpenMP support, they do nothing. If you do compile with OpenMP, they will make certain FFTs run in parallel. I can comment them out if you prefer.
Yes, I would be more comfortable without them (for 1.3). This is typically the kind of small changes which can be a PITA to deal with just before a release because it breaks some platforms in non obvious ways. For the restrict keyword support, I will add a distutils check to avoid the compiler-specifics (again after 1.3). cheers, David
Mon, 16 Mar 2009 00:33:28 +0900, David Cournapeau wrote:
Sturla Molden wrote:
There is a version of fftpack_litemodule.c, fftpack.c and fftpack.h that does this attached to ticket #1055. The two important changes are releasing the GIL and using npy_intp for 64 bit support.
Would it be possible to make the changes as a patch (svn diff) - this makes things easier to review.
Also, you could post the patch on the http://codereview.appspot.com site. Then it would be easier to both review and to keep track of its revisions. (Attachments, especially whole-file ones sent on the mailing list are IMHO significantly more cumbersome for all parties concerned.) In practice the code review tool is also easier to use than sending SVN diffs to the mailing list. If you are working on a SVN checkout, I recommend using the upload tool http://codereview.appspot.com/static/upload.py to upload the patch to the code review site. Just do python upload.py on the SVN checkout containing your changes. (You'll need a Google account for this, though.) When revising the patch after the initial upload, specify the Codereview site issue number to the upload: python upload.py -i 12345 so that the new version of the patch is marked as an improved version of the old one. -- Pauli Virtanen
Mon, 16 Mar 2009 00:33:28 +0900, David Cournapeau wrote:
Also, you could post the patch on the http://codereview.appspot.com site. Then it would be easier to both review and to keep track of its revisions
I have posted the files here: http://projects.scipy.org/numpy/ticket/1055 Sturla
Sun, 15 Mar 2009 17:48:51 +0100, Sturla Molden wrote:
Mon, 16 Mar 2009 00:33:28 +0900, David Cournapeau wrote:
Also, you could post the patch on the http://codereview.appspot.com site. Then it would be easier to both review and to keep track of its revisions
I have posted the files here:
Well, that's nearly as good. (Though submitting a single svn diff containing all changes could have been a bit more easy to handle than separate patches for each file. But a small nitpick only.) But I wonder if there is a way to improve the behavior of Trac with attachments/patches, there seem to be currently some warts: - Can a non-admin user delete or mark some attachments obsolete? - Trac doesn't want to show all patches in HTML, and doesn't recognize the .diff suffix. Maybe this can be fixed. - Inline comments in patches would be nice, as in codereview. -- Pauli Virtanen
Well, that's nearly as good. (Though submitting a single svn diff containing all changes could have been a bit more easy to handle than separate patches for each file. But a small nitpick only.)
The problem is I am really bad at using these tools. I have TortoiseSVN installed, but no idea how to use it. :( I cannot delete any file attachment in trac, but I can overwrite the files I've posted. S.M.
On Mon, Mar 16, 2009 at 2:43 AM, Sturla Molden
Well, that's nearly as good. (Though submitting a single svn diff containing all changes could have been a bit more easy to handle than separate patches for each file. But a small nitpick only.)
The problem is I am really bad at using these tools. I have TortoiseSVN installed, but no idea how to use it. :(
You can use the command line version: svn diff gives exactly what you need. Another thing is to separate different issues in different patches - Treal -> double is different than npy_intp for indexing which is different from the threading issue. It really makes life easier when reviewing code. I have a git branch with those changes, but I don't think I will include it for 1.3. I don't have the time to make sure the fftpack code really is thread-safe, and I don't want to merge the code without at least one person to review it. cheers, David
David Cournapeau wrote:
You can use the command line version: svn diff gives exactly what you need.
http://www.sliksvn.com/en/download is a good source of the command line client for Windows. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Would it be possible to make the changes as a patch (svn diff) - this makes things easier to review.
I've added diff files to ticket #1055.
Yes, I would be more comfortable without them (for 1.3). This is typically the kind of small changes which can be a PITA to deal with just before a release because it breaks some platforms in non obvious ways.
Ok, they are commented out.
For the restrict keyword support, I will add a distutils check to avoid the compiler-specifics (again after 1.3).
I've added a header file npy_restrict.h that defines a NPY_RESTRICT symbol. Best regards, Sturla
participants (7)
-
Charles R Harris
-
Christopher Barker
-
David Cournapeau
-
David Cournapeau
-
josef.pktd@gmail.com
-
Pauli Virtanen
-
Sturla Molden