[Neuroimaging] [DIPY] Setting up a platform for offline end-to-end quality assurance for DIPY

Ariel Rokem arokem at gmail.com
Thu Mar 3 13:29:06 EST 2016

Hi Eleftherios,

I have resources to run this kind of thing on AWS, or some other cloud
provider. I see many advantages to doing this on the cloud and using
something like docker for deployment (e.g., portability and reproducibility
in other people's hands, as well as relatively easy scaling in ours). Data
can then also consistently be pulled from the HCP S3 buckets (see for
example the beginning of the notebook here:
https://github.com/arokem/end-to-end/blob/master/end-to-end.ipynb). Once we
have automated all that, it will also be relatively easy to transfer these
ideas to the other use-cases you mentioned.

But we'd need to do some math to see how much this would actually cost. Do
you have a sense of the requirements? For example, how often would you want
to run the pipeline? Every time a PR happens? That's happening quite often
these days ;-) I don't believe we need a really large machine to run
persistently. We might want a small machine running persistently,
monitoring github for us, and then waking up the big beast when there's a
lot of work to do. That might reduce costs.



On Thu, Mar 3, 2016 at 8:24 AM, Eleftherios Garyfallidis <
garyfallidis at gmail.com> wrote:

> Dear Matthew, Maxime, Ariel and all,
> Mr. Dumont and I have started creating some workflows which can be run by
> the command line. These are made to work with large real datasets.
> I think it would be great if we could use a different type of testing from
> what we were using right now. Most of the testing we use is actually fast
> testing of functions and we should definitely continue having that.
> But I think we need also an end-to-end offline testing where we actually
> test with big whole brain datasets and then we can collect some automatic
> quality assurance reports. In that way we cover most of unexpected issues.
> Now, the problem with having such a platform is that it needs computing
> power and some disk space. It may need a descent computer to run for 24
> hours for example and let's say around 100 GBytes of free disk space. Then
> it will also need to send some automated reports to say that is all good or
> not.
> Ariel has suggested  to use the cloud and docker but I am afraid that it
> will be too expensive for our pockets right now except if someone can
> donate to the project.
> An alternative idea would be to go gradually and setup one of the
> computers in Sherbrooke or in Berkeley or in Seattle to do such a job. I
> think this QA should run once/twice a week rather than every day.
> Now there are other platforms that need to run relatively frequently. One
> is the examples for the documentation and then there is Omar's validation
> framework which actually needs a large cluster. We can deal with those at a
> later stage.
> The easiest way forward with the workflows that I see right now is that
> Mr. Dumont adds a script in dipy/tools that will run all the workflows as
> we do with make_examples.py that run all the examples. We first try this
> platform in Sherbrooke and then we need to figure out a way to send
> automated reports to the core developers or to berkeley builders and so on.
> Maybe sending a PDF or HTML of the output screenshots would be also a good
> idea.
> Let me know what you think.
> Cheers,
> Eleftherios
> _______________________________________________
> Neuroimaging mailing list
> Neuroimaging at python.org
> https://mail.python.org/mailman/listinfo/neuroimaging
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/neuroimaging/attachments/20160303/6fcf6431/attachment.html>

More information about the Neuroimaging mailing list