fastlmm-user

Download

fastlmm-user@python.org

February 2020

1 discussions

New releases of FaST-LMM and PySnpTools
by Carl KADIE July 3, 2021

July 3, 2021

I’m happy to announce a new releases of FaST-LMM<https://pypi.org/project/fastlmm/> and PySnpTools<https://pypi.org/project/pysnptools/>. (This release been my “work” since I retired last summer.) The new releases updates both packages to work with the newest version of Pandas, Numpy, and Scikit-learn. The new FaST-LMM release includes single_snp_scale, which allows FaST-LMM to use a cluster and scale to 1 million individuals. See Kadie and Heckerman, bioRxiv 2018<https://www.biorxiv.org/content/10.1101/154682v2> for background. Similar tools would require 100,000 computers to scale this much, but FaST-LMM needs “only” a cluster of 100 computers. (The code can run on any cluster but to run on a particular cluster we must create a module detailing how to automate batch jobs and move files.) The new PySnpTools release adds support for cluster-sized data. Including: * snpreader.SnpGen<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpGen>: Generate synthetic SNP data on the fly. * snpreader.SnpMemMap<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpMemMap>: Support larger in-memory data via on-disk memory mapping. * snpreader.DistributedBed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.DistributedBed>: Split Bed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.Bed>-like data into multiple files for more efficient cluster use * util.mapreduce1<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.mapreduce1>: Run loops in parallel on multiple processes, threads, or clusters * util.filecache<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.filecache>: Automatically copy files to and from any remote storage. FaST-LMM and PySnpTools were originally developed and open sourced at Microsoft Research. Active development has now based at https://fastlmm.github.io/. Roadmap: I plan to continue working on FaST-LMM and PySnpTools. We’d like to run a giant job on real, rather than synthetic, data. We like to compare it other fast methods that we suspect sacrifice accuracy. I’d like to port it from Python 2 to Python 3. (More todo’s: analyze multiple traits in one run, analyze pairs of DNA locations using the single-DNA-location tools, …) Contacts: Email the developers at fastlmm-dev(a)python.org<mailto:fastlmm-dev@python.org>. Join<mailto:fastlmm-user-join@python.org?subject=Subscribe> the user discussion and announcement list (or use web sign up<https://mail.python.org/mailman3/lists/fastlmm-user.python.org>). Yours, Carl Carl Kadie, Ph.D. FaST-LMM Team

2 1