I’m happy to announce a new releases of FaST-LMM and PySnpTools. (This release been my “work” since I retired last summer.)
The new releases updates both packages to work with the newest version of Pandas, Numpy, and Scikit-learn.
The new FaST-LMM release includes single_snp_scale, which allows FaST-LMM to use a cluster and scale to 1 million individuals. See Kadie and Heckerman, bioRxiv 2018 for background. Similar tools would require 100,000 computers to scale this much, but FaST-LMM needs “only” a cluster of 100 computers. (The code can run on any cluster but to run on a particular cluster we must create a module detailing how to automate batch jobs and move files.)
The new PySnpTools release adds support for cluster-sized data. Including:
FaST-LMM and PySnpTools were originally developed and open sourced at Microsoft Research. Active development has now based at https://fastlmm.github.io/.
Roadmap:
I plan to continue working on FaST-LMM and PySnpTools. We’d like to run a giant job on real, rather than synthetic, data. We like to compare it other fast methods that we suspect sacrifice accuracy. I’d like to port it from Python 2 to Python 3. (More todo’s: analyze multiple traits in one run, analyze pairs of DNA locations using the single-DNA-location tools, …)
Contacts:
Email the developers at
fastlmm-dev@python.org.
Join the user discussion and announcement list (or use
web sign up).
Yours,
Carl
Carl Kadie, Ph.D.
FaST-LMM Team