[pypy-dev] pypy1.6 slow on string-heavy ops
Jacob Biesinger
jake.biesinger at gmail.com
Thu Aug 18 23:35:12 CEST 2011
Hi all,
New to the list and fairly new to pypy. First of all, congrats on the new
1.6 release-- the growing support for numpy is very exciting (go, fight,
win, take state!).
So I snagged the 1.6 release to test if it would be faster on the kind of
code I often write: Bioinformatics. In this snippet, the point is to check
the mappability of the genome-- if a particular substring appears more than
once in the genome, the region is called unmappable.
Machine Specs:
64bit Ubuntu 11.04
119048 CPython 2.7.1 pystones/second
416667 pypy1.6 pystones/second
The CPython version of the following code takes a bit more than a minute to
run on the 21st chromosome of the human reference genome, but the pypy
version has been going for 27+ minutes and hasn't yet finished the first
step of loading the genome as a dict of strings.
Am I using some construct that's particularly difficult for pypy? Am I
missing something?
hg18 chrom 21 is available at
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/chr21.fa.gz
import sys
def slide_dna(dna, windowsize):
for i in xrange(len(dna) - windowsize):
slice = dna[i:i+windowsize]
if 'N' not in slice:
yield slice
fasta_file = sys.argv[1] # should be *.fa
print 'loading dna from', fasta_file
chroms = {}
dna = None
for l in open(fasta_file):
if l.startswith('>'): # new chromosome
if dna is not None:
chroms[chrom] = dna
chrom = l.strip().replace('>', '')
dna = ''
else:
dna += l.rstrip()
if dna is not None:
chroms[chrom] = dna
for length in [15]:#, 25, 35, 45, 55, 65, 75]:
print 'now on', length
mappable = 0
repeat = 0
s = {}
for dna in chroms.itervalues():
for slice in slide_dna(dna, length):
try:
s[slice] += 1
except KeyError:
s[slice] = 1
print 'built, now counting'
for dna in chroms.itervalues():
for slice in slide_dna(dna, length):
if s[slice] == 1:
mappable += 1
else:
repeat += 1
print 'for substring length %s, mappable: %s, repeat: %s' % (length,
mappable, repeat)
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110818/32fd96a6/attachment.html>
More information about the pypy-dev
mailing list