I’m happy to announce a new releases of FaST-LMM<https://pypi.org/project/fastlmm/> and PySnpTools<https://pypi.org/project/pysnptools/>. (This release been my “work” since I retired last summer.)
The new releases updates both packages to work with the newest version of Pandas, Numpy, and Scikit-learn.
The new FaST-LMM release includes single_snp_scale, which allows FaST-LMM to use a cluster and scale to 1 million individuals. See Kadie and Heckerman, bioRxiv 2018<https://www.…
[View More]biorxiv.org/content/10.1101/154682v2> for background. Similar tools would require 100,000 computers to scale this much, but FaST-LMM needs “only” a cluster of 100 computers. (The code can run on any cluster but to run on a particular cluster we must create a module detailing how to automate batch jobs and move files.)
The new PySnpTools release adds support for cluster-sized data. Including:
* snpreader.SnpGen<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpGen>: Generate synthetic SNP data on the fly.
* snpreader.SnpMemMap<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpMemMap>: Support larger in-memory data via on-disk memory mapping.
* snpreader.DistributedBed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.DistributedBed>: Split Bed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.Bed>-like data into multiple files for more efficient cluster use
* util.mapreduce1<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.mapreduce1>: Run loops in parallel on multiple processes, threads, or clusters
* util.filecache<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.filecache>: Automatically copy files to and from any remote storage.
FaST-LMM and PySnpTools were originally developed and open sourced at Microsoft Research. Active development has now based at https://fastlmm.github.io/.
Roadmap:
I plan to continue working on FaST-LMM and PySnpTools. We’d like to run a giant job on real, rather than synthetic, data. We like to compare it other fast methods that we suspect sacrifice accuracy. I’d like to port it from Python 2 to Python 3. (More todo’s: analyze multiple traits in one run, analyze pairs of DNA locations using the single-DNA-location tools, …)
Contacts:
Email the developers at fastlmm-dev(a)python.org<mailto:fastlmm-dev@python.org>.
Join<mailto:fastlmm-user-join@python.org?subject=Subscribe> the user discussion and announcement list (or use web sign up<https://mail.python.org/mailman3/lists/fastlmm-user.python.org>).
Yours,
Carl
Carl Kadie, Ph.D.
FaST-LMM Team
[View Less]
Greetings,
We've just released a new version of PySnpTools<https://pypi.org/project/pysnptools/> with support for (unphased, diploid, biallelic) BGEN files<https://www.well.ox.ac.uk/~gav/bgen_format/> and, more generally, distributions over allele counts. It also computes expectations (on the fly) allowing the current version of FaST-LMM to now work directly with BGEN data. Here is a sample:
[A screenshot of a cell phone Description automatically generated]
The PySnpTool's Bgen …
[View More]reader is built on Danilo Horta's Bgen-reader-py package.
On top Danilo's tools, it adds substantial speed-ups (e.g. 40 times faster). Currently, on my machine, with 1000 individuals, The PySnpTools BGEN reader can read from about 4000 SNPs per second, so about 4 million 3-value distributions per second. When the number of individuals goes to 250,000, it continues to read about 4 million 3-value distributions per second. (Danilo and I are working together to put these speedups in the next version of Bgen-reader-py, too.)
PySnpTools starts reading with almost no start-up time and reads only the subset of data requested. This Jupyter notebook shows its use on BGEN data<https://nbviewer.jupyter.org/github/fastlmm/PySnpTools/blob/master/doc/ipyn…> and the use of other features, such as memory mapped file to process distribution data with very low memory.
For this release of PySnpTools updates all examples to automatically download desired example files.
Carl
Carl Kadie, Ph.D.
FaST-LMM & PySnpTools Team
(Microsoft Research, retired)
Join the FaST-LMM user discussion and announcement list via email<mailto:fastlmm-user-join@python.org?subject=Subscribe> (or use web sign up<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.pyth…>)
[View Less]