Hi all, I'm really excited about this, but I would like to hear feedback as well as solicit help with it. There are a lot of new tests we can write, particularly for frontend bits that have gotten us in the past. We can also use Nose to measure performance over time, which would be a nice way of checking for regressions or improvements. As I note in the PR, I'd like to get a discussion going about this -- any feedback, would be very, very welcome. Does this meet our needs for answer testing? Will you be willing to write tests for a given frontend? What else could be added or improved? I'd also like to suggest that we have a Hangout or IRC meeting to get some builds set up and actually try this out on a couple different machines. My best times would be Tuesday at 4PM EST or Wednesday at 2PM EST. -Matt ---------- Forwarded message ---------- From: Matthew Turk <pullrequests-noreply@bitbucket.org> Date: Thu, Oct 18, 2012 at 10:28 PM Subject: [yt_analysis/yt] Answer testing plugin for Nose (pull request #308) A new pull request has been opened by Matthew Turk. MatthewTurk/yt has changes to be pulled into yt_analysis/yt. https://bitbucket.org/yt_analysis/yt/pull-request/308/answer-testing-plugin-... Title: Answer testing plugin for Nose This pull request includes an answer testing plugin for Nose, as well as a mechanism by which this plugin can be used to upload new results and compare existing results to a gold standard, stored in Amazon. ## How does Answer Testing work now? Currently, Answer Testing in yt works by running a completely home-grown test runner, discoverer, and storage system. This works on a single parameter file at a time, and there is little flexibility in how the parameter files are tested. For instance, you cannot select fields based on the code that generated the pf. This catches many but not all errors, and can only test Enzo and FLASH. When a new set of "reliable" tests has been identified, it is tarred up and uploaded. No one ever really used them, and it's difficult to run them unless you're on Matt's machine. ## What does this do? There are two ways in which this can function: * Pull down results for a given parameter file and compare the locally-created results against them * Run new results and upload those to S3 These are not meant to co-exist. In fact, the ideal method of operation is that when the answer tests are changed *intentionally*, new gold standards are generated and pushed to S3 by one of a trusted set of users. (New users can be added, with the privs necessary to push a new set of tests.) This adds a new config option to `~/.yt/config` in the `[yt]` section: `test_data_dir`, which is where parameter files (such as "IsolatedGalaxy" and "DD0010" from yt's distribution) can be found. When the nosetests are run, any parameter files it finds in that directory will be used as answer testing input. In `yt/frontends/enzo/tests/test_outputs.py` is the Enzo frontend tests that rely on parameter files. Note that right now, the standard AMR tests are quite extensive and generate a lot of data; I am still in the process of creating new tests to replicate the old answer tests, and also slimming it down for big datasets. To run a comparison, you must first run "develop" so that the new nose plugin becomes available. Then, in the yt directory, `nosetests --with-answer-testing frontends/enzo/ --answer-compare=gold001` To run a set of tests and *store* them: `nosetests --with-answer-testing frontends/enzo/ --answer-store --answer-name=gold001` We can now not only run answer tests, but we don't have to manage (manually or otherwise) the uploads. yt will do this for us, using boto. Down the road we can swap out Amazon for any OpenStack-compliant cloud provider, such as SDSC's cloud. Additionally, we can now add answer testing of small data to Shining Panda. In the future, we can add answer testing of large data with lower frequency, as well. ## What's Next? Because there's a lot to take in, I'd like to suggest this PR not be accepted as-is. There are a few items that need to be done first: * The developer community needs to be brought in on this; I would like to suggest either a hangout or an IRC meeting to discuss how this works. I'd also encourage others to pull this PR, run the nosetests command that compares data, and figure out if they like how it looks. * The old tests all need to be replicated. This means things like projections turned into pixel buffers, field statistics (without storing the fields) * Tests need to be added for other frontends. I am currently working with other frontend maintainers to get data, but once we've gotten it, we need to add tests as is done for Enzo. This means FLASH, Nyx, Orion, as well as any others that would like to be on the testing suite. I'd like to encourage specific comments on lines of code to be left here, as well as comments on the actual structure of the code, but I'll be forwarding this PR to yt-dev and asking for broader comments there. I think that having a single, integrated testing system that can test a subset of parameter files (as well as auto-discover them) will be extremely valuable for ensuring maintainability. I'm really excited about this. Changes to be pulled: -- This is an issue notification from bitbucket.org. You are receiving this either because you are the participating in a pull request, or you are following it.