
Hello scipy, I'm in the process of adding new tests for sparse matrices with dtype=bool, while I'm at it, I thought it would be a good idea to add tests for other dtypes. Currently all the tests are for int64 and float64. There is some canonical data defined in test_base.py. I use this to make data with all the supported dtypes. Then I change the tests that use this canonical data to iterate over this data with different dtypes (including bool). Does this sound ok? It will make running test_base.py a bit slower. You can see my work here: https://github.com/cowlicks/scipy/commits/official-bool-support

On Fri, May 31, 2013 at 11:49 PM, Blake Griffith <blake.a.griffith@gmail.com
wrote:
Hello scipy, I'm in the process of adding new tests for sparse matrices with dtype=bool, while I'm at it, I thought it would be a good idea to add tests for other dtypes. Currently all the tests are for int64 and float64.
There is some canonical data defined in test_base.py. I use this to make data with all the supported dtypes. Then I change the tests that use this canonical data to iterate over this data with different dtypes (including bool).
Does this sound ok? It will make running test_base.py a bit slower. You can see my work here:
That makes sense to me. Depends on what you mean by "a bit" if it's a problem that tests run slower. If the test adds more than a few hundred milliseconds then you should think about what part of those tests have to be labeled @slow. Ralf
https://github.com/cowlicks/scipy/commits/official-bool-support
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Sun, Jun 2, 2013 at 3:02 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, May 31, 2013 at 11:49 PM, Blake Griffith < blake.a.griffith@gmail.com> wrote:
Hello scipy, I'm in the process of adding new tests for sparse matrices with dtype=bool, while I'm at it, I thought it would be a good idea to add tests for other dtypes. Currently all the tests are for int64 and float64.
There is some canonical data defined in test_base.py. I use this to make data with all the supported dtypes. Then I change the tests that use this canonical data to iterate over this data with different dtypes (including bool).
Does this sound ok? It will make running test_base.py a bit slower. You can see my work here:
That makes sense to me. Depends on what you mean by "a bit" if it's a problem that tests run slower. If the test adds more than a few hundred milliseconds then you should think about what part of those tests have to be labeled @slow.
Ralf
I've been going through on a per test basis changing each test to test every dtype instead of just one, so each modified test takes ~14 times as long (14 for each supported dtype)... I think this will slow the test suite way too much. Unfortunately, the way I am adding theses tests using @slow would prevent the whole test from being run. Ideally I would just decorate the new extra dtype tests with @slow and the test would still run for the standard int64/float64 dtypes. So I need to change the way I am doing this.

On Mon, Jun 3, 2013 at 1:32 AM, Blake Griffith <blake.a.griffith@gmail.com>wrote:
On Sun, Jun 2, 2013 at 3:02 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Fri, May 31, 2013 at 11:49 PM, Blake Griffith < blake.a.griffith@gmail.com> wrote:
Hello scipy, I'm in the process of adding new tests for sparse matrices with dtype=bool, while I'm at it, I thought it would be a good idea to add tests for other dtypes. Currently all the tests are for int64 and float64.
There is some canonical data defined in test_base.py. I use this to make data with all the supported dtypes. Then I change the tests that use this canonical data to iterate over this data with different dtypes (including bool).
Does this sound ok? It will make running test_base.py a bit slower. You can see my work here:
That makes sense to me. Depends on what you mean by "a bit" if it's a problem that tests run slower. If the test adds more than a few hundred milliseconds then you should think about what part of those tests have to be labeled @slow.
Ralf
I've been going through on a per test basis changing each test to test every dtype instead of just one, so each modified test takes ~14 times as long (14 for each supported dtype)... I think this will slow the test suite way too much.
Right now test_base.py already takes ~7% of the time of scipy.test(), so 14x increase of that file is too much. Maybe there's something that can be done to make it run faster to start with. I'll try to have a look at what you changed so far tonight. Ralf
Unfortunately, the way I am adding theses tests using @slow would prevent the whole test from being run. Ideally I would just decorate the new extra dtype tests with @slow and the test would still run for the standard int64/float64 dtypes. So I need to change the way I am doing this.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Mon, Jun 3, 2013 at 1:20 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jun 3, 2013 at 1:32 AM, Blake Griffith <blake.a.griffith@gmail.com
wrote:
On Sun, Jun 2, 2013 at 3:02 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Fri, May 31, 2013 at 11:49 PM, Blake Griffith < blake.a.griffith@gmail.com> wrote:
Hello scipy, I'm in the process of adding new tests for sparse matrices with dtype=bool, while I'm at it, I thought it would be a good idea to add tests for other dtypes. Currently all the tests are for int64 and float64.
There is some canonical data defined in test_base.py. I use this to make data with all the supported dtypes. Then I change the tests that use this canonical data to iterate over this data with different dtypes (including bool).
Does this sound ok? It will make running test_base.py a bit slower. You can see my work here:
That makes sense to me. Depends on what you mean by "a bit" if it's a problem that tests run slower. If the test adds more than a few hundred milliseconds then you should think about what part of those tests have to be labeled @slow.
Ralf
I've been going through on a per test basis changing each test to test every dtype instead of just one, so each modified test takes ~14 times as long (14 for each supported dtype)... I think this will slow the test suite way too much.
Right now test_base.py already takes ~7% of the time of scipy.test(), so 14x increase of that file is too much. Maybe there's something that can be done to make it run faster to start with. I'll try to have a look at what you changed so far tonight.
Ralf
Unfortunately, the way I am adding theses tests using @slow would prevent the whole test from being run. Ideally I would just decorate the new extra dtype tests with @slow and the test would still run for the standard int64/float64 dtypes. So I need to change the way I am doing this.
I think rewriting this test suite, with paramterized dtypes and data would be worth the effort I have a week before I start the next part of my GSoC timeline which I think is enough time. It also will make my life easier for the rest of the summer. Does this sound too ambitious to do in a week?

On Mon, Jun 3, 2013 at 6:04 PM, Blake Griffith <blake.a.griffith@gmail.com>wrote:
On Mon, Jun 3, 2013 at 1:20 AM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Mon, Jun 3, 2013 at 1:32 AM, Blake Griffith < blake.a.griffith@gmail.com> wrote:
On Sun, Jun 2, 2013 at 3:02 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Fri, May 31, 2013 at 11:49 PM, Blake Griffith < blake.a.griffith@gmail.com> wrote:
Hello scipy, I'm in the process of adding new tests for sparse matrices with dtype=bool, while I'm at it, I thought it would be a good idea to add tests for other dtypes. Currently all the tests are for int64 and float64.
There is some canonical data defined in test_base.py. I use this to make data with all the supported dtypes. Then I change the tests that use this canonical data to iterate over this data with different dtypes (including bool).
Does this sound ok? It will make running test_base.py a bit slower. You can see my work here:
That makes sense to me. Depends on what you mean by "a bit" if it's a problem that tests run slower. If the test adds more than a few hundred milliseconds then you should think about what part of those tests have to be labeled @slow.
Ralf
I've been going through on a per test basis changing each test to test every dtype instead of just one, so each modified test takes ~14 times as long (14 for each supported dtype)... I think this will slow the test suite way too much.
Right now test_base.py already takes ~7% of the time of scipy.test(), so 14x increase of that file is too much. Maybe there's something that can be done to make it run faster to start with. I'll try to have a look at what you changed so far tonight.
Ralf
Unfortunately, the way I am adding theses tests using @slow would prevent the whole test from being run. Ideally I would just decorate the new extra dtype tests with @slow and the test would still run for the standard int64/float64 dtypes. So I need to change the way I am doing this.
I think rewriting this test suite, with paramterized dtypes and data would be worth the effort I have a week before I start the next part of my GSoC timeline which I think is enough time. It also will make my life easier for the rest of the summer.
Does this sound too ambitious to do in a week?
That doesn't sound too ambitious. I'm not sure yet it's necessary though. What you added so for in your official-bool-support branch didn't cost a large amount of time yet. The main problem I see is that you're making the changes in _TestCommon, which is inherited by tests for all 7 sparse matrix formats. So if you put in a for-loop over all 15 supported dtypes in a test, you now have 7*15 = 105 versions of that test being executed. I don't think that's necessary for a lot of operations - probably testing all dtypes only for CSR would already help a lot. Also, try to measure the impact. There are some nose plugins that are perhaps better, but this is a good start: sparse.test(extra_argv=['--with-profile']) You'll see that _TestCommon.setUp is called 613 times for example, and it does more than it needs to. I suggest to first get clear where exactly it would help to test all dtypes, then estimate how much time it would take. Cheers, Ralf

Ralf Gommers <ralf.gommers <at> gmail.com> writes: [clip]
I suggest to first get clear where exactly it would help to test all dtypes, then estimate how much time it would take.
I would maybe worry about the test runtime only after refactoring, if it in the end turns out to that the tests are too slow. However, planning ahead, if the set of dtypes to check is specified as a class property, self.supported_dtypes it should be relatively straightforward to limit e.g. the full dtype tests only to CSR. -- Pauli Virtanen

Blake Griffith <blake.a.griffith <at> gmail.com> writes: [clip]
I think rewriting this test suite, with paramterized dtypes and data would be worth the effort I have a week before I start the next part of my GSoC timeline which I think is enough time. It also will make my life easier for the rest of the summer.
Does this sound too ambitious to do in a week?
Extending the dtypes covered is certainly doable in a week, the test_base.py file is not that large --- depending on how you approach the problem. Of course, if you include the time it takes to fix the bugs uncovered, that may take longer :) It may be easiest to *not* try to do anything very ambitious, but instead refactor it test-by-test in those cases where multiple-dtype testing makes sense. The point here being that I think we shouldn't spend more time on cleaning the tests up than necessary, so the simplest solution that gets the test coverage extended is probably the best one. I'd do the dtype extension like this: def test_something(self): mat = np.array([[1,2],[3,4]], dtype=dtype) spmat = self.spmatrix(mat) assert_array_equal(spmat.todense(), mat) to def test_something(self): def check(dtype): mat = np.array([[1,2],[3,4]], dtype=dtype) spmat = self.spmatrix(mat) assert_array_equal(spmat.todense(), mat) for dtype in SUPPORTED_DTYPES: yield check, dtype where SUPPORTED_DTYPES is defined somewhere above, as a global or as a class attribute. Plus: ensure that the test class doesn't inherit from TestCase; the inheritance needs to be removed from the class factory function `sparse_test_class`. I'd avoid refactoring test matrices to class-level ones, and would just cast the test matrices in the tests to an appropriate dtype in the check() method. The above is of course an opinion done without starting to it myself (I refactored the file already once, though, but different aspect). -- Pauli Virtanen
participants (3)
-
Blake Griffith
-
Pauli Virtanen
-
Ralf Gommers