Wow thanks for the quick response. The performance is *much, much* better with the suggested list-join. CPython still beats Pypy, but only by a narrow margin:<div><br></div><div><font face="'courier new', monospace">pypy1.6: 1m33.142s</font></div>
<div><font face="'courier new', monospace">CPython 2.7.1: 1m12.092s</font></div><div><br></div><div>Thanks for the advice-- I had forgotten about string immutability and its associated costs. And keep up the good work on pypy! I look forward to the day I can replace CPython with pypy in more interesting scientific workflows </end plug for scipy integration></div>
<div><br></div><div>A bit OT: The recent release of ipython added some powerful multiprocessing features using ZeroMQ. I've only glanced at pypy's extensive threading optimizations (e.g., greenlets). Does pypy jit across thread/process boundaries?</div>
<div>--<br>Jake Biesinger<br>Graduate Student<br>Xie Lab, UC Irvine<br>
<br><br><div class="gmail_quote">On Thu, Aug 18, 2011 at 4:01 PM, Justin Peel <span dir="ltr"><<a href="mailto:peelpy@gmail.com">peelpy@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Yes, I just looked at it. For cases like this where there is<br>
effectively only one reference to the string being appended to, it<br>
just resizes the string in-place and copies in the string being<br>
appended which gives it O(N) performance. It is a hack that is<br>
available only because of the reference counting that CPython employs<br>
for memory management.<br>
<br>
For reference, the hack is in Python/ceval.c in the string_concatenate function.<br>
<div><div></div><div class="h5"><br>
On Thu, Aug 18, 2011 at 4:50 PM, Aaron DeVore <<a href="mailto:aaron.devore@gmail.com">aaron.devore@gmail.com</a>> wrote:<br>
> Python 2.4 introduced a change that helps improve performance of<br>
> string concatenation, according to its release notes. I don't know<br>
> anything beyond that.<br>
><br>
> -Aaron DeVore<br>
><br>
> On Thu, Aug 18, 2011 at 3:31 PM, Justin Peel <<a href="mailto:peelpy@gmail.com">peelpy@gmail.com</a>> wrote:<br>
>> Yes, Vincent's way is the better way to go. To elaborate more on the<br>
>> problem, string appending is O(N^2) while appending to a list and then<br>
>> joining is an O(N) operation. Why CPython is faster than Pypy at doing<br>
>> the less efficient way is something that I'm not fully sure about, but<br>
>> I believe that it might have to do with the differing memory<br>
>> management strategies.<br>
>><br>
>> On Thu, Aug 18, 2011 at 4:24 PM, Vincent Legoll<br>
>> <<a href="mailto:vincent.legoll@gmail.com">vincent.legoll@gmail.com</a>> wrote:<br>
>>> Hello,<br>
>>><br>
>>> Try this:<br>
>>><br>
>>> import sys<br>
>>><br>
>>> fasta_file = sys.argv[1] # should be *.fa<br>
>>> print 'loading dna from', fasta_file<br>
>>> chroms = {}<br>
>>> dna = []<br>
>>> for l in open(fasta_file):<br>
>>> if l.startswith('>'): # new chromosome<br>
>>> if len(dna) > 0:<br>
>>> chroms[chrom] = ''.join(dna)<br>
>>> chrom = l.strip().replace('>', '')<br>
>>> dna = []<br>
>>> else:<br>
>>> dna.append(l.rstrip())<br>
>>> if len(dna) > 0:<br>
>>> chroms[chrom] = ''.join(dna)<br>
>>><br>
>>> --<br>
>>> Vincent Legoll<br>
>>> _______________________________________________<br>
>>> pypy-dev mailing list<br>
>>> <a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>
>>> <a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>
>>><br>
>> _______________________________________________<br>
>> pypy-dev mailing list<br>
>> <a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>
>> <a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>
>><br>
><br>
_______________________________________________<br>
pypy-dev mailing list<br>
<a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>
<a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>
</div></div></blockquote></div><br></div>