[pypy-dev] pypy1.6 slow on string-heavy ops

Jacob Biesinger jake.biesinger at gmail.com
Fri Aug 19 02:10:29 CEST 2011


Wow thanks for the quick response.  The performance is *much, much* better
with the suggested list-join.  CPython still beats Pypy, but only by a
narrow margin:

pypy1.6:             1m33.142s
CPython 2.7.1:       1m12.092s

Thanks for the advice-- I had forgotten about string immutability and its
associated costs.  And keep up the good work on pypy!  I look forward to the
day I can replace CPython with pypy in more interesting scientific workflows
</end plug for scipy integration>

A bit OT:  The recent release of ipython added some powerful multiprocessing
features using ZeroMQ.  I've only glanced at pypy's extensive threading
optimizations (e.g., greenlets).  Does pypy jit across thread/process
boundaries?
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine


On Thu, Aug 18, 2011 at 4:01 PM, Justin Peel <peelpy at gmail.com> wrote:

> Yes, I just looked at it. For cases like this where there is
> effectively only one reference to the string being appended to, it
> just resizes the string in-place and copies in the string being
> appended which gives it O(N) performance. It is a hack that is
> available only because of the reference counting that CPython employs
> for memory management.
>
> For reference, the hack is in Python/ceval.c in the string_concatenate
> function.
>
> On Thu, Aug 18, 2011 at 4:50 PM, Aaron DeVore <aaron.devore at gmail.com>
> wrote:
> > Python 2.4 introduced a change that helps improve performance of
> > string concatenation, according to its release notes. I don't know
> > anything beyond that.
> >
> > -Aaron DeVore
> >
> > On Thu, Aug 18, 2011 at 3:31 PM, Justin Peel <peelpy at gmail.com> wrote:
> >> Yes, Vincent's way is the better way to go. To elaborate more on the
> >> problem, string appending is O(N^2) while appending to a list and then
> >> joining is an O(N) operation. Why CPython is faster than Pypy at doing
> >> the less efficient way is something that I'm not fully sure about, but
> >> I believe that it might have to do with the differing memory
> >> management strategies.
> >>
> >> On Thu, Aug 18, 2011 at 4:24 PM, Vincent Legoll
> >> <vincent.legoll at gmail.com> wrote:
> >>> Hello,
> >>>
> >>> Try this:
> >>>
> >>> import sys
> >>>
> >>> fasta_file = sys.argv[1]  # should be *.fa
> >>> print 'loading dna from', fasta_file
> >>> chroms = {}
> >>> dna = []
> >>> for l in open(fasta_file):
> >>>    if l.startswith('>'):  # new chromosome
> >>>        if len(dna) > 0:
> >>>            chroms[chrom] = ''.join(dna)
> >>>        chrom = l.strip().replace('>', '')
> >>>        dna = []
> >>>    else:
> >>>        dna.append(l.rstrip())
> >>> if len(dna) > 0:
> >>>    chroms[chrom] = ''.join(dna)
> >>>
> >>> --
> >>> Vincent Legoll
> >>> _______________________________________________
> >>> pypy-dev mailing list
> >>> pypy-dev at python.org
> >>> http://mail.python.org/mailman/listinfo/pypy-dev
> >>>
> >> _______________________________________________
> >> pypy-dev mailing list
> >> pypy-dev at python.org
> >> http://mail.python.org/mailman/listinfo/pypy-dev
> >>
> >
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> http://mail.python.org/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110818/5895807c/attachment.html>


More information about the pypy-dev mailing list