[yt-dev] Fwd: [yt_analysis/yt] Answer testing plugin for Nose (pull request #308)

Oct. 19, 2012

      Hi all,

I'm really excited about this, but I would like to hear feedback as
well as solicit help with it.  There are a lot of new tests we can
write, particularly for frontend bits that have gotten us in the past.
 We can also use Nose to measure performance over time, which would be
a nice way of checking for regressions or improvements.

As I note in the PR, I'd like to get a discussion going about this --
any feedback, would be very, very welcome.  Does this meet our needs
for answer testing?  Will you be willing to write tests for a given
frontend?  What else could be added or improved?

I'd also like to suggest that we have a Hangout or IRC meeting to get
some builds set up and actually try this out on a couple different
machines.  My best times would be Tuesday at 4PM EST or Wednesday at
2PM EST.

-Matt

---------- Forwarded message ----------
From: Matthew Turk <pullrequests-noreply@bitbucket.org>
Date: Thu, Oct 18, 2012 at 10:28 PM
Subject: [yt_analysis/yt] Answer testing plugin for Nose (pull request #308)

A new pull request has been opened by Matthew Turk.

MatthewTurk/yt has changes to be pulled into yt_analysis/yt.

https://bitbucket.org/yt_analysis/yt/pull-request/308/answer-testing-plugin-...

Title: Answer testing plugin for Nose

This pull request includes an answer testing plugin for Nose, as well
as a mechanism by which this plugin can be used to upload new results
and compare existing results to a gold standard, stored in Amazon.

## How does Answer Testing work now?

Currently, Answer Testing in yt works by running a completely
home-grown test runner, discoverer, and storage system.  This works on
a single parameter file at a time, and there is little flexibility in
how the parameter files are tested.  For instance, you cannot select
fields based on the code that generated the pf.  This catches many but
not all errors, and can only test Enzo and FLASH.

When a new set of "reliable" tests has been identified, it is tarred
up and uploaded.  No one ever really used them, and it's difficult to
run them unless you're on Matt's machine.

## What does this do?

There are two ways in which this can function:

 * Pull down results for a given parameter file and compare the
locally-created results against them
 * Run new results and upload those to S3

These are not meant to co-exist.  In fact, the ideal method of
operation is that when the answer tests are changed *intentionally*,
new gold standards are generated and pushed to S3 by one of a trusted
set of users.  (New users can be added, with the privs necessary to
push a new set of tests.)

This adds a new config option to `~/.yt/config` in the `[yt]` section:
`test_data_dir`, which is where parameter files (such as
"IsolatedGalaxy" and "DD0010" from yt's distribution) can be found.
When the nosetests are run, any parameter files it finds in that
directory will be used as answer testing input.  In
`yt/frontends/enzo/tests/test_outputs.py` is the Enzo frontend tests
that rely on parameter files.  Note that right now, the standard AMR
tests are quite extensive and generate a lot of data; I am still in
the process of creating new tests to replicate the old answer tests,
and also slimming it down for big datasets.

To run a comparison, you must first run "develop" so that the new nose
plugin becomes available.  Then, in the yt directory,

`nosetests --with-answer-testing frontends/enzo/ --answer-compare=gold001`

To run a set of tests and *store* them:

`nosetests --with-answer-testing frontends/enzo/ --answer-store
--answer-name=gold001`

We can now not only run answer tests, but we don't have to manage
(manually or otherwise) the uploads.  yt will do this for us, using
boto.  Down the road we can swap out Amazon for any
OpenStack-compliant cloud provider, such as SDSC's cloud.

Additionally, we can now add answer testing of small data to Shining
Panda.  In the future, we can add answer testing of large data with
lower frequency, as well.

## What's Next?

Because there's a lot to take in, I'd like to suggest this PR not be
accepted as-is.  There are a few items that need to be done first:

 * The developer community needs to be brought in on this; I would
like to suggest either a hangout or an IRC meeting to discuss how this
works.  I'd also encourage others to pull this PR, run the nosetests
command that compares data, and figure out if they like how it looks.
 * The old tests all need to be replicated.  This means things like
projections turned into pixel buffers, field statistics (without
storing the fields)
 * Tests need to be added for other frontends.  I am currently working
with other frontend maintainers to get data, but once we've gotten it,
we need to add tests as is done for Enzo.  This means FLASH, Nyx,
Orion, as well as any others that would like to be on the testing
suite.

I'd like to encourage specific comments on lines of code to be left
here, as well as comments on the actual structure of the code, but
I'll be forwarding this PR to yt-dev and asking for broader comments
there.  I think that having a single, integrated testing system that
can test a subset of parameter files (as well as auto-discover them)
will be extremely valuable for ensuring maintainability.  I'm really
excited about this.

Changes to be pulled:

--
This is an issue notification from bitbucket.org.
You are receiving this either because you are the participating
in a pull request, or you are following it.