I’m happy to announce a new releases of FaST-LMM and PySnpTools. (This release been my “work” since I retired last summer.)

 

The new releases updates both packages to work with the newest version of Pandas, Numpy, and Scikit-learn.

 

The new FaST-LMM release includes single_snp_scale, which allows FaST-LMM to use a cluster and scale to 1 million individuals. See Kadie and Heckerman, bioRxiv 2018 for background. Similar tools would require 100,000 computers to scale this much, but FaST-LMM needs “only” a cluster of 100 computers. (The code can run on any cluster but to run on a particular cluster we must create a module detailing how to automate batch jobs and move files.)

 

The new PySnpTools release adds support for cluster-sized data. Including:

 

FaST-LMM and PySnpTools were originally developed and open sourced at Microsoft Research. Active development has now based at https://fastlmm.github.io/.

 

Roadmap:

I plan to continue working on FaST-LMM and PySnpTools. We’d like to run a giant job on real, rather than synthetic, data. We like to compare it other fast methods that we suspect sacrifice accuracy. I’d like to port it from Python 2 to Python 3. (More todo’s: analyze multiple traits in one run, analyze pairs of DNA locations using the single-DNA-location tools, …)

 

Contacts:

Email the developers at fastlmm-dev@python.org.

Join the user discussion and announcement list (or use web sign up).

 

Yours,

Carl

 

Carl Kadie, Ph.D.

FaST-LMM Team