[scikit-learn] Hiring: Cloud and HPC Engineer at NIMH in Bethesda, MD

Thomas, Adam (NIH/NIMH) [E] adamt at nih.gov
Wed Aug 17 14:53:29 EDT 2016


The National Institute of Mental Health (NIMH) is the lead federal
agency for research on mental disorders. NIMH is one of the 27
Institutes and Centers that make up the National Institutes of Health
(NIH), which is responsible for all federally funded biomedical research
in US. NIH is part of the U.S. Department of Health and Human Services
(HHS). The NIH is a highly rated employer at glassdoor.com with very
competitive salary and benefits packages.

The Data Science and Sharing Team (DSST) is a new group created to
develop and support data sharing and other data-intensive scientific
projects within the NIMH Intramural Research Program (IRP). Working
closely with the Office of Data Science the goal of the DSST is to make
the NIMH IRP a leader in the open science and data sharing practices
mandated by the Open Data Policy released by the White House on 9 May,
2013. We are building a team to make that happen.

What you’ll do…


You will work with a team of researchers and developers to build and
deploy neuroimaging data processing pipelines for investigators within
the NIMH IRP. You will collaborate with and contribute to other projects
throughout the world that are building standards and tools for open and
reproducible neuroscience (e.g., NiPy, BIDS, Binder, Rstudio). You'll
have the resources of the NIH HPC Cluster at your disposal as well as
additional help from the AWS cloud. All tools and code will be open
source and freely distributed.


You will work to bolster data science skills within the NIMH IRP by
teaching courses to scientists on best data practices (e.g. Software
& Data Carpentry) as well as accessing and using specific neuroimaging
repositories (e.g. The Human Connectome Project, OpenfMRI, UK Biobank).


There is no use building tools for open science if no one uses them.
Part of the job of the DSST is to measure data sharing and open science
practices within the NIMH IRP and progress toward their adoption. This
will include bibliometrics for scientific publications from the NIMH IRP
and other measures of data sharing and secondary data utilization. You
will provide crucial systems level support to the team in gauging this

Who you are…


You should be very comfortable on the command line and have a rock-solid
handle on one or more Unix-based operating systems. You should have some
experience with distributed, high-performance computing tools such as
Spark, OpenStack, Docker/Singularity, and batch processing systems such
as SLURM and SGE. You should also have experience coding in modern
languages currently used in data-intensive, scientific computing such as
Python, R, and Javascript, as well as interfacing with a variety of


Ideally we would like to see a recent degree (BS, MS, or PhD) in a STEM
field, but if you can prove you have an equivalent amount of expertise
with your publications, projects, or github/kaggle ranking, we’re all
ears. We are also interviewing students and part-time staff if you’re
still working on your degree.


Data science is moving fast – we’re looking for someone who can move
faster. You should be a self-learner and a self-starter. Provide some
examples of things you have worked on independently.

How to apply…

Email your resume, a cover letter, and a code sample that demonstrates
you are all three of the above to:

DATASCI-JOBSEARCH at mail.nih.gov

The National Institutes of Health is an equal opportunity employer.

More information about the scikit-learn mailing list