I’m happy to announce a new releases of FaST-LMM<https://pypi.org/project/fastlmm/> and PySnpTools<https://pypi.org/project/pysnptools/>. (This release been my “work” since I retired last summer.)
The new releases updates both packages to work with the newest version of Pandas, Numpy, and Scikit-learn.
The new FaST-LMM release includes single_snp_scale, which allows FaST-LMM to use a cluster and scale to 1 million individuals. See Kadie and Heckerman, bioRxiv 2018<https://www.…
[View More]biorxiv.org/content/10.1101/154682v2> for background. Similar tools would require 100,000 computers to scale this much, but FaST-LMM needs “only” a cluster of 100 computers. (The code can run on any cluster but to run on a particular cluster we must create a module detailing how to automate batch jobs and move files.)
The new PySnpTools release adds support for cluster-sized data. Including:
* snpreader.SnpGen<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpGen>: Generate synthetic SNP data on the fly.
* snpreader.SnpMemMap<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.SnpMemMap>: Support larger in-memory data via on-disk memory mapping.
* snpreader.DistributedBed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.DistributedBed>: Split Bed<https://fastlmm.github.io/PySnpTools/#pysnptools.snpreader.Bed>-like data into multiple files for more efficient cluster use
* util.mapreduce1<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.mapreduce1>: Run loops in parallel on multiple processes, threads, or clusters
* util.filecache<https://fastlmm.github.io/PySnpTools/#module-pysnptools.util.filecache>: Automatically copy files to and from any remote storage.
FaST-LMM and PySnpTools were originally developed and open sourced at Microsoft Research. Active development has now based at https://fastlmm.github.io/.
Roadmap:
I plan to continue working on FaST-LMM and PySnpTools. We’d like to run a giant job on real, rather than synthetic, data. We like to compare it other fast methods that we suspect sacrifice accuracy. I’d like to port it from Python 2 to Python 3. (More todo’s: analyze multiple traits in one run, analyze pairs of DNA locations using the single-DNA-location tools, …)
Contacts:
Email the developers at fastlmm-dev(a)python.org<mailto:fastlmm-dev@python.org>.
Join<mailto:fastlmm-user-join@python.org?subject=Subscribe> the user discussion and announcement list (or use web sign up<https://mail.python.org/mailman3/lists/fastlmm-user.python.org>).
Yours,
Carl
Carl Kadie, Ph.D.
FaST-LMM Team
[View Less]
Hi again,
I have two questions using the similarity matrix in single_snp:
1) Which format do I have to provide if I use the npz format? My npz matrix
throws the error below.
2) If I don't provide K0, how is the similarity matrix calculated and is it
possible to store the matrix for other runs?
Thanks a lot
Stefanie
Error:
Traceback (most recent call last):
File "C:/Users//PycharmProjects/BCC_Experiments/lmm/lmm01.py", line 27,
in <module>
results_df = single_snp(bed_fn, …
[View More]pheno_fn, K0=k)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\fastlmm\association\single_snp.py",
line 246, in single_snp
runner = runner)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\mapreduce.py",
line 202, in map_reduce
result = runner.run(dist)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\runner\local.py",
line 48, in run
result = _run_all_in_memory(distributable)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\runner\__init__.py",
line 30, in _run_all_in_memory
return work.reduce(result_sequence)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\mapreduce.py",
line 77, in reduce
return self.reducer(output_seq)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\fastlmm\association\single_snp.py",
line 228, in reducer_closure
frame = pd.concat(frame_sequence)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pandas\core\reshape\concat.py",
line 295, in concat
sort=sort,
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pandas\core\reshape\concat.py",
line 339, in __init__
objs = list(objs)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\runner\__init__.py",
line 14, in work_sequence_to_result_sequence
result = work()
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\mapreduce.py",
line 65, in <lambda>
yield lambda i=i, input_arg=input_arg: self.dowork(i, input_arg) # the
'i=i',etc is need to get around a strangeness in Python
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\mapreduce.py",
line 92, in dowork
result = _run_all_in_memory(work)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\runner\__init__.py",
line 25, in _run_all_in_memory
return work()
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\util\mapreduce1\mapreduce.py",
line 91, in <lambda>
work = lambda : self.mapper(input_arg)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\fastlmm\association\single_snp.py",
line 211, in nested_closure
K0_chrom = _K_per_chrom(K0 or G0 or test_snps, chrom, test_snps.iid)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\fastlmm\association\single_snp.py",
line 301, in _K_per_chrom
return SnpKernel(K_all.snpreader[:,K_all.pos[:,0] !=
chrom],K_all.standardizer)
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\kernelreader\snpkernel.py",
line 150, in pos
return self.snpreader.pos
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\snpreader\snpreader.py",
line 404, in pos
return self.col_property
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\pstreader\pstnpz.py",
line 67, in col_property
self._run_once()
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\pysnptools\pstreader\pstnpz.py",
line 82, in _run_once
self._row = data['row']
File
"C:\Users\\AppData\Local\Continuum\anaconda3\envs\gwas_flow\lib\site-packages\numpy\lib\npyio.py",
line 259, in __getitem__
raise KeyError("%s is not a file in the archive" % key)
KeyError: 'row is not a file in the archive'
Process finished with exit code 1
[View Less]