I’m happy to announce a new releases of FaST-LMM<https://pypi.org/project/fastlmm/> and PySnpTools<https://pypi.org/project/pysnptools/>. (This release been my “work” since I retired last summer.)
The new releases updates both packages to work with the newest version of Pandas, Numpy, and Scikit-learn.
The new FaST-LMM release includes single_snp_scale, which allows FaST-LMM to use a cluster and scale to 1 million individuals. See Kadie and Heckerman, bioRxiv 2018<https://www.biorxiv.org/content/10.1101/154682v2> for background. Similar tools would require 100,000 computers to scale this much, but FaST-LMM needs “only” a cluster of 100 computers. (The code can run on any cluster but to run on a particular cluster we must create a module detailing how to automate batch jobs and move files.)
The new PySnpTools release adds support for cluster-sized data. Including:
* snpreader.SnpGen<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpGen>: Generate synthetic SNP data on the fly.
* snpreader.SnpMemMap<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpMemMap>: Support larger in-memory data via on-disk memory mapping.
* snpreader.DistributedBed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.DistributedBed>: Split Bed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.Bed>-like data into multiple files for more efficient cluster use
* util.mapreduce1<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.mapreduce1>: Run loops in parallel on multiple processes, threads, or clusters
* util.filecache<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.filecache>: Automatically copy files to and from any remote storage.
FaST-LMM and PySnpTools were originally developed and open sourced at Microsoft Research. Active development has now based at https://fastlmm.github.io/.
Roadmap:
I plan to continue working on FaST-LMM and PySnpTools. We’d like to run a giant job on real, rather than synthetic, data. We like to compare it other fast methods that we suspect sacrifice accuracy. I’d like to port it from Python 2 to Python 3. (More todo’s: analyze multiple traits in one run, analyze pairs of DNA locations using the single-DNA-location tools, …)
Contacts:
Email the developers at fastlmm-dev(a)python.org<mailto:fastlmm-dev@python.org>.
Join<mailto:fastlmm-user-join@python.org?subject=Subscribe> the user discussion and announcement list (or use web sign up<https://mail.python.org/mailman3/lists/fastlmm-user.python.org>).
Yours,
Carl
Carl Kadie, Ph.D.
FaST-LMM Team