[Neuroimaging] ANN: openneuro-py, a new app for downloading OpenNeuro datasets

Yaroslav Halchenko lists at onerussian.com
Tue Dec 15 11:52:50 EST 2020


On Tue, 15 Dec 2020, Christopher Markiewicz wrote:

> Hi all,

> FWIW almost all public datasets have been pushed to GitHub and can be accessed via datalad (exceptions being tracked on these issues: https://github.com/OpenNeuroOrg/openneuro/issues/1741 and https://github.com/OpenNeuroOrg/openneuro/issues/1743).

>     datalad install https://github.com/OpenNeuroDatasets/ds00WXYZ.git

> Datalad makes it pretty straightforward to download only the portions of the data you want.

FWIW, datalad is also accompanied with Python API for all of its
functionality, so analogous command with fetching specific subjects (or
any path you like) would be smth like

	$> python3 -c 'import datalad.api as dl; ds = dl.install("https://github.com/OpenNeuroDatasets/ds000001.git"); ds.get([f"sub-{i:02d}" for i in [1,2,3]], jobs=5)'

Similarly it would work for HCP, many INDI, etc datasets.  Explore some more on
http://datasets.datalad.org/ and learn more about datalad at
http://handbook.datalad.org/

> This isn't to deprecate Richard's tool.

Agree!  DataLad uses git and git-annex -- might be a bit heavy of a dependency
for some use cases.  We are working hard though to ensure datalad with all
dependencies be easy to install  (windows remains an issue somewhat, but ok for
"downloader" part ;))

BUT 

- see https://github.com/nidata/nidata  - a similar concept excercised in the
past (back then it was openfmri), and covers more of other data sources.
Unfortunately development stopped.  If to pursue an endeavor of a pure
downloader -- might be worth somehow joining forces with prior effort.

- with datalad you get not just a "downloader" but overall content management
  not only for "source data" but for results as well.  See e.g.

  https://github.com/ReproNim/containers/#a-typical-workflow

  for an example of prototypical computation workflow, where source data and
  results are versioned and "reproducible".

Sorry for a shameless DataLad plug and some grains of salt in my follow up, I
don't want to sound negative and not-supportive, but it is also hard to be
unbiased with my DataLad hat on.   And if project to be
created/maintained beyond "an exercise", those points might better be
considered.

Cheers,
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW:   http://www.linkedin.com/in/yarik        



More information about the Neuroimaging mailing list