Mailman 3 Testing Infrastructure: Datasets for ART, Orion, FLASH, etc ... - yt-dev

Testing Infrastructure: Datasets for ART, Orion, FLASH, etc ...

Matthew Turk

12 Oct 2012 12 Oct '12

4:54 p.m.

Hi all, Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends. So the idea here is: Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run. Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing. The next step is: 1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus. Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure. So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least. So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything. -Matt

Show replies by date

Casey W. Stark

12 Oct 12 Oct

5:02 p.m.

Hey Matt. I would like to provide the data for Nyx. Not sure what sort of output would be useful though. So I knew of some of the tests you and Anthony added, but there are 500 unit tests now? Isn't that a bit strange? - Casey On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...

Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Nathan Goldbaum

5:04 p.m.

In this terminology, each assert statement is a test. It's quite easy to make dozens of new tests inside a couple of nested for loops. On Oct 12, 2012, at 3:02 PM, Casey W. Stark wrote:

...

Hey Matt.

I would like to provide the data for Nyx. Not sure what sort of output would be useful though.

So I knew of some of the tests you and Anthony added, but there are 500 unit tests now? Isn't that a bit strange?

- Casey

On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com> wrote: Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Britton Smith

5:06 p.m.

What is the number of tests of unique functionality? On Fri, Oct 12, 2012 at 6:04 PM, Nathan Goldbaum <nathan12343@gmail.com>wrote:

...

In this terminology, each assert statement is a test. It's quite easy to make dozens of new tests inside a couple of nested for loops.

On Oct 12, 2012, at 3:02 PM, Casey W. Stark wrote:

Hey Matt.

I would like to provide the data for Nyx. Not sure what sort of output would be useful though.

So I knew of some of the tests you and Anthony added, but there are 500 unit tests now? Isn't that a bit strange?

- Casey

On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com>wrote:

...
Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Casey W. Stark

5:08 p.m.

Thanks for the clarification Nathan. I see how that can build up very quickly. - Casey On Fri, Oct 12, 2012 at 3:06 PM, Britton Smith <brittonsmith@gmail.com>wrote:

...

What is the number of tests of unique functionality?

On Fri, Oct 12, 2012 at 6:04 PM, Nathan Goldbaum <nathan12343@gmail.com>wrote:

...
In this terminology, each assert statement is a test. It's quite easy to make dozens of new tests inside a couple of nested for loops.

On Oct 12, 2012, at 3:02 PM, Casey W. Stark wrote:

Hey Matt.

I would like to provide the data for Nyx. Not sure what sort of output would be useful though.

So I knew of some of the tests you and Anthony added, but there are 500 unit tests now? Isn't that a bit strange?

- Casey

On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com>wrote:

...
Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

5:09 p.m.

Hi Britton, On Fri, Oct 12, 2012 at 3:06 PM, Britton Smith <brittonsmith@gmail.com> wrote:

...

What is the number of tests of unique functionality?

I'm not sure I know what you mean? For the most part, we're testing many aspects of individual components. As an example, we now have a whole bunch of tests that address different aspects (each of which is relatively tricky and sensitive to changes in the code base) of covering grids, projections, profiles, and so on. So I guess in a sense, we're really well-testing about 5 different pieces of the code in the unit tests. In the answer tests we test a much broader section of the code, but it takes longer and requires reference data. -Matt

...

On Fri, Oct 12, 2012 at 6:04 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:

...
In this terminology, each assert statement is a test. It's quite easy to make dozens of new tests inside a couple of nested for loops.

On Oct 12, 2012, at 3:02 PM, Casey W. Stark wrote:

Hey Matt.

I would like to provide the data for Nyx. Not sure what sort of output would be useful though.

So I knew of some of the tests you and Anthony added, but there are 500 unit tests now? Isn't that a bit strange?

- Casey

On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...
Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Britton Smith

7:24 p.m.

Hi Matt, Your response here and the one to Casey answered my question. It wasn't clear to me that testing the same function with different configurations was counting as separate tests. I understand now. Thanks! Britton On Fri, Oct 12, 2012 at 6:09 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...

Hi Britton,

On Fri, Oct 12, 2012 at 3:06 PM, Britton Smith <brittonsmith@gmail.com> wrote:

...
What is the number of tests of unique functionality?

I'm not sure I know what you mean? For the most part, we're testing many aspects of individual components. As an example, we now have a whole bunch of tests that address different aspects (each of which is relatively tricky and sensitive to changes in the code base) of covering grids, projections, profiles, and so on. So I guess in a sense, we're really well-testing about 5 different pieces of the code in the unit tests. In the answer tests we test a much broader section of the code, but it takes longer and requires reference data.

-Matt

...
On Fri, Oct 12, 2012 at 6:04 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:

...
In this terminology, each assert statement is a test. It's quite easy

to

...
...
make dozens of new tests inside a couple of nested for loops.

On Oct 12, 2012, at 3:02 PM, Casey W. Stark wrote:

Hey Matt.

I would like to provide the data for Nyx. Not sure what sort of output would be useful though.

So I knew of some of the tests you and Anthony added, but there are 500 unit tests now? Isn't that a bit strange?

- Casey

On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...
Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

5:07 p.m.

Hi Casey, On Fri, Oct 12, 2012 at 3:02 PM, Casey W. Stark <caseywstark@gmail.com> wrote:

...

Hey Matt.

I would like to provide the data for Nyx. Not sure what sort of output would be useful though.

Two outputs, one small and one slightly less small, that are self-contained. Cosmology, with particles, would be ideal.

...

So I knew of some of the tests you and Anthony added, but there are 500 unit tests now? Isn't that a bit strange?

Well, the tests I added also check random data being supplied to projections and profiles in several different configurations: * Lazy reading on/off * 1, 2, 4, 8 grid patches of random data Plus, I've also checked several different aspects. So if you have two different aspects of something that's getting checked for 4 different processor layouts, that's 8 already. As an example, here's the covering_grid test: https://bitbucket.org/yt_analysis/yt/src/2d91e2e7f12a/yt/data_objects/tests/... You can see that they multiply fast -- this one tests a bunch of sub-aspects of the covering grid, and each yield inside the iterator adds a new test. Quickly adds up! -Matt

...

- Casey

On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...
Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

5:25 p.m.

[snip]

...

I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this.

[snip] Okay, we're laughably far back in their queue. I'm inclined to think that if we get a couple people who look at Shining Panda, think it looks good and so on, that we just upgrade the plan to be paid for so that we don't have to wait and we can fix this problem now instead of in some distant date. What do y'all think? http://shiningpanda-ci.com -Matt

Nathan Goldbaum

5:28 p.m.

$144/year is well worth it and it's probably good karma to support them since they have to pay for the infrastructure. +1 (although it's not my money ;) On Oct 12, 2012, at 3:25 PM, Matthew Turk wrote:

...

[snip]

...
I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this.

[snip]

Okay, we're laughably far back in their queue. I'm inclined to think that if we get a couple people who look at Shining Panda, think it looks good and so on, that we just upgrade the plan to be paid for so that we don't have to wait and we can fix this problem now instead of in some distant date. What do y'all think?

http://shiningpanda-ci.com

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

14 Oct 14 Oct

1:41 p.m.

Hi Nathan et al, We just got approved for a FLOSS license on Shining Panda. As it currently stands, we can't mount EBS volumes, so answer testing will not be a part of this initiative as of right now. Kacper's suggested we explore other options for answer testing, and I think he's right -- particularly because we may end up using a lot of memory and putting strain on an individual node, and that's probably bad karma. So I think we should keep the two somewhat separately tested for now, and continue creating unit tests that test either randomly created in-memory data or isolated functionality. I can add users and administrators to the SP account: https://www.shiningpanda-ci.com/docs/dashboard/multiusers.html We also can identify different repos that get pulled from to test. I think we definitely want to test the main yt repository and the main yt-3.0 repository (once it has tests finalized.) We can also add a few "semi-trusted" repositories -- for instance, Sam does a lot of development in his fork, so he may want to have it added to the buildbots. I do a lot, so I may also want mine added. If you have a "user" role you can make it test any time you want. If you have an administrator role you can do the same but also add new repos to test. Would anyone like to be added as a "user"? I think this would also need to coincide with having your personal repository added as a testing location. For administrator, I think we should restrict to people who have been really involved in the pull request and code mentoring process. Kacper and I are going to hash out some answer testing stuff in IRC before coming back to the list with any ideas. But clearly we need to ensure we cover both answer testing AND unit testing in the infrastructure we develop. -Matt On Fri, Oct 12, 2012 at 3:28 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:

...

$144/year is well worth it and it's probably good karma to support them since they have to pay for the infrastructure.

+1 (although it's not my money ;)

On Oct 12, 2012, at 3:25 PM, Matthew Turk wrote:

...
[snip]

...
I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this.

[snip]

Okay, we're laughably far back in their queue. I'm inclined to think that if we get a couple people who look at Shining Panda, think it looks good and so on, that we just upgrade the plan to be paid for so that we don't have to wait and we can fix this problem now instead of in some distant date. What do y'all think?

http://shiningpanda-ci.com

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

4:33 p.m.

I forgot to point at the URL for the Shining Panda instance: https://jenkins.shiningpanda-ci.com/yt-project/ New repos we pull from and test will show up here. On Sun, Oct 14, 2012 at 11:41 AM, Matthew Turk <matthewturk@gmail.com> wrote:

...

Hi Nathan et al,

We just got approved for a FLOSS license on Shining Panda. As it currently stands, we can't mount EBS volumes, so answer testing will not be a part of this initiative as of right now. Kacper's suggested we explore other options for answer testing, and I think he's right -- particularly because we may end up using a lot of memory and putting strain on an individual node, and that's probably bad karma. So I think we should keep the two somewhat separately tested for now, and continue creating unit tests that test either randomly created in-memory data or isolated functionality.

I can add users and administrators to the SP account:

https://www.shiningpanda-ci.com/docs/dashboard/multiusers.html

We also can identify different repos that get pulled from to test. I think we definitely want to test the main yt repository and the main yt-3.0 repository (once it has tests finalized.) We can also add a few "semi-trusted" repositories -- for instance, Sam does a lot of development in his fork, so he may want to have it added to the buildbots. I do a lot, so I may also want mine added.

If you have a "user" role you can make it test any time you want.

If you have an administrator role you can do the same but also add new repos to test.

Would anyone like to be added as a "user"? I think this would also need to coincide with having your personal repository added as a testing location. For administrator, I think we should restrict to people who have been really involved in the pull request and code mentoring process.

Kacper and I are going to hash out some answer testing stuff in IRC before coming back to the list with any ideas. But clearly we need to ensure we cover both answer testing AND unit testing in the infrastructure we develop.

-Matt

On Fri, Oct 12, 2012 at 3:28 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:

...
$144/year is well worth it and it's probably good karma to support them since they have to pay for the infrastructure.

+1 (although it's not my money ;)

On Oct 12, 2012, at 3:25 PM, Matthew Turk wrote:

...
[snip]

...
I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this.

[snip]

Okay, we're laughably far back in their queue. I'm inclined to think that if we get a couple people who look at Shining Panda, think it looks good and so on, that we just upgrade the plan to be paid for so that we don't have to wait and we can fix this problem now instead of in some distant date. What do y'all think?

http://shiningpanda-ci.com

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Britton Smith

9:12 p.m.

Hi Matt and all, This is great! I am interested in getting involved and am willing to be a user or administrator, whatever makes the most sense. I am going to try this week to start some writing tests as well. Britton On Sun, Oct 14, 2012 at 5:33 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...

I forgot to point at the URL for the Shining Panda instance:

https://jenkins.shiningpanda-ci.com/yt-project/

New repos we pull from and test will show up here.

On Sun, Oct 14, 2012 at 11:41 AM, Matthew Turk <matthewturk@gmail.com> wrote:

...
Hi Nathan et al,

We just got approved for a FLOSS license on Shining Panda. As it currently stands, we can't mount EBS volumes, so answer testing will not be a part of this initiative as of right now. Kacper's suggested we explore other options for answer testing, and I think he's right -- particularly because we may end up using a lot of memory and putting strain on an individual node, and that's probably bad karma. So I think we should keep the two somewhat separately tested for now, and continue creating unit tests that test either randomly created in-memory data or isolated functionality.

I can add users and administrators to the SP account:

https://www.shiningpanda-ci.com/docs/dashboard/multiusers.html

We also can identify different repos that get pulled from to test. I think we definitely want to test the main yt repository and the main yt-3.0 repository (once it has tests finalized.) We can also add a few "semi-trusted" repositories -- for instance, Sam does a lot of development in his fork, so he may want to have it added to the buildbots. I do a lot, so I may also want mine added.

If you have a "user" role you can make it test any time you want.

If you have an administrator role you can do the same but also add new repos to test.

Would anyone like to be added as a "user"? I think this would also need to coincide with having your personal repository added as a testing location. For administrator, I think we should restrict to people who have been really involved in the pull request and code mentoring process.

Kacper and I are going to hash out some answer testing stuff in IRC before coming back to the list with any ideas. But clearly we need to ensure we cover both answer testing AND unit testing in the infrastructure we develop.

-Matt

On Fri, Oct 12, 2012 at 3:28 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:

...
$144/year is well worth it and it's probably good karma to support them since they have to pay for the infrastructure.

+1 (although it's not my money ;)

On Oct 12, 2012, at 3:25 PM, Matthew Turk wrote:

...
[snip]

...
I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this.

[snip]

Okay, we're laughably far back in their queue. I'm inclined to think that if we get a couple people who look at Shining Panda, think it looks good and so on, that we just upgrade the plan to be paid for so that we don't have to wait and we can fix this problem now instead of in some distant date. What do y'all think?

http://shiningpanda-ci.com

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Andrew Myers

12 Oct 12 Oct

5:37 p.m.

Hi Matt, I'd be happy to provide some Orion datasets for testing. Best, Andrew On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...

Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

5:39 p.m.

Awesome! And awesome to Casey, too. Here's what I'll do -- I will spin up an EC2 instance and add you both as users on it. If you could send me (off-list) a throwaway public ssh key I'll add that and send back the IP address. On Fri, Oct 12, 2012 at 3:37 PM, Andrew Myers <atmyers@berkeley.edu> wrote:

...

Hi Matt,

I'd be happy to provide some Orion datasets for testing.

Best, Andrew

On Fri, Oct 12, 2012 at 2:54 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...
Hi all,

Today at UCSC, Nathan, Chris (Moody) and I sat down and went through what we wanted to accomplish with testing. This comes back to the age-old dichotomy between unit testing and answer testing. But what this really comes back to, now that we had the opportunity to think about it, is the difference between testing components and functionality versus testing frontends.

So the idea here is:

Unit tests => Cover, using either manually inserted data values or randomly generated "parameter files", individual units of the code. Stephen and I have written a bunch in the last couple days. We have nearly 500, and they take < 1 minute to run.

Frontend/Answer tests => Cover a large portion of high-level functionality that touches a lot of the code, but do so by running things like projections, profiles, etc on actual data from actual simulation codes, which then get compared to reference values that are stored somewhere. Currently we have ~550 answer tests, and they run every 30 minutes on moving7_0010 (comes wit yt) and once a day on JHK-DD0030 (on yt-project.org/data/ as IsolatedGalaxy .) We do not have automated FLASH testing.

The next step is:

1) Getting a bunch of non-proprietary sets of data that are small *and* medium, for each code base we want to test. This data must be non-proprietary! For small, I would say they can be trivially small. For medium, I'd prefer in the 0.5 - 5 gb range for size-on-disk. I would think that GasSloshing and WindTunnel could work for FLASH. But we still need ART data (from Chris Moody), GDF or Piernik data (from Kacper), Orion data (if possible), Nyx data (if possible). I will handle adding RAMSES data in the 3.0 branch. 2) Getting a mechanism to run answer tests that isn't "Matt's desktop." I've emailed Shining Panda about this, but if they don't have the ability to provide us with a FLOSS license, I think we can identify some funding to do this. 3) Have a mechanism to display and collate results. ShiningPanda would do this if we were on their systems. 4) Make it much easier to flag individual tests as needing updates. I think the Data Hub will be the end place for this, but this is lower priority. 5) Migrate answer testing to use unit testing framework, as most of what we've done there re-implements stuff that is in the unit testing frameworks. This will mean we can much more easily handle test-discovery, which is a huge plus.

Ultimately, the end product of all of this is that we should eventually have a method for running a single set of tests that do test discovery that loads up a bunch of different data outputs, runs answer tests on all of them, runs the unit tests, etc etc. I think it just needs the last 25% to finish up the infrastructure.

So: those of you out there who have access to any datasets of types otehr than FLASH or Enzo, can you provide non-proprietary, medium-size and small-size datasets? I'd like to have two for every code base, at least.

So: those of you who want to help out, would you be interested in looking at the answer_testing framework with me? I am happy to discuss it over email or IRC to convert it to the numpy testing format, which will be much easier to maintain in the long run and make it much easier to have a single testing system that works for everything.

-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

4228

Age (days ago)

4231

Last active (days ago)

List overview

Download

14 comments

5 participants

participants (5)

Andrew Myers
Britton Smith
Casey W. Stark
Matthew Turk
Nathan Goldbaum

Testing Infrastructure: Datasets for ART, Orion, FLASH, etc ...

Casey W. Stark

Casey W. Stark

tags

participants (5)