[pypy-dev] Program slower on Pypy 7.3.3 (3.7.9) than CPython 3.9.

Dan Stromberg strombrg at gmail.com
Wed Mar 17 13:56:55 EDT 2021


On Tue, Mar 16, 2021 at 2:27 AM Carl Friedrich Bolz-Tereick <cfbolz at gmx.de>
wrote:

> On 3/15/21 11:16 PM, Dan Stromberg wrote:
> >
> > And it's opensource, though many of the inputs are licensed.
> >
> > The code is at https://stromberg.dnsalias.org/~strombrg/music-pipeline/
> > <https://stromberg.dnsalias.org/~strombrg/music-pipeline/>
> > (https://stromberg.dnsalias.org/svn/music-pipeline/trunk/
> > <https://stromberg.dnsalias.org/svn/music-pipeline/trunk/>)
> >
> > It appears to be more than 10x slower.
> >
> > I haven't profiled it yet.  I believe it's probably the "Blocklisting
> > files..." part that's slow.  That part is O(n*m), and seems to take
> > forever.  It's heavy on regular expressions.
> >
> > Are regular expressions expected to be slow on Pypy3?
>
> Hi Dan,
>
> Interesting problem! single regular expressions are reasonably fast on
> PyPy, being jitted. But I don't think we looked into the problem of
> "what if you have thousands of them" before. Your reproducer is hitting
> a kind of known, hard to fix corner case of the JIT, it's actually
> producing a linear search over the existing regular expressions for
> every match call in this case, with catastrophic consequences. It's on
> my mid-term plans to work on this problem, but not next week.
>

Here's another SSCCE that surprised me a little.  I create and del the
compiled regexes one at a time, but it's still slow:
https://stromberg.dnsalias.org/svn/regex-fodder/trunk/regex-fodder-3


> Here's a fun workaround, that improves the performance of both CPython
> (by about 2x for me) and pypy (by 10x or so): turn the many regular
> expressions into a single one:
>
>      regex_strings = [f"(?:{one_regex()})" for repno in range(2_046)]
>      regex_compiled = re.compile("|".join(regex_strings))
>
> then you replace the match calls with a single one:
>
>      for filename in filenames:
>          if regex_compiled.match(filename):
>              matches += 1
>
> I believe you can try the same approach for your full program?
>

I'm familiar with the technique, as well as that of creating a single, big
trie regex.  For this application though, I need to check at the end if
each regex was matched exactly once, to deter typos causing things to get
missed.  Thanks much for the suggestion and more!

-- 
Dan Stromberg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pypy-dev/attachments/20210317/612e8621/attachment.html>


More information about the pypy-dev mailing list