Re: [Speed] Performance comparison of regular expression engines

On 06.03.16 09:14, Maciej Fijalkowski wrote:
Any chance you can rerun this on pypy?
Results on PyPy 2.2.1 (I'm not sure I could build the last PyPy on my computer):
re str.find
Twain 5 5.469 3.852 (?i)Twain 10 8.646 [a-z]shing 165 17.24 Huck[a-zA-Z]+|Saw[a-zA-Z]+ 52 7.763 \b\w+nn\b 32 101 [a-q][^u-z]{13}x 445 167.6 Tom|Sawyer|Huckleberry|Finn 314 8.583 (?i)Tom|Sawyer|Huckleberry|Finn 477 16.3 .{0,2}(Tom|Sawyer|Huckleberry|Finn) 314 270.9 .{2,4}(Tom|Sawyer|Huckleberry|Finn) 237 262 Tom.{10,25}river|river.{10,25}Tom 1 8.461 [a-zA-Z]+ing 10079 348 \s[a-zA-Z]{0,12}ing\s 7160 115.8 ([A-Za-z]awyer|[A-Za-z]inn)\s 50 16.62 ["'][^"']{0,30}[?!\.]["'] 1618 14.45
Alternative regular expression engines need extension modules and don't work on PyPy for me.
For comparison results on CPython 2.7.11+:
re regex re2 pcre str.find
Twain 5 4.423 2.699 8.045 93.4 4.181 (?i)Twain 10 50.07 3.563 20.35 185.6 [a-z]shing 165 98.68 6.365 23.71 2886 Huck[a-zA-Z]+|Saw[a-zA-Z]+ 52 58.97 50.26 19.52 1016 \b\w+nn\b 32 130.1 416.5 18.38 740.7 [a-q][^u-z]{13}x 445 406.6 7.935 5886 7137 Tom|Sawyer|Huckleberry|Finn 314 53.09 59.1 20.33 5377 (?i)Tom|Sawyer|Huckleberry|Finn 477 281.2 338.5 23.77 7895 .{0,2}(Tom|Sawyer|Huckleberry|Finn) 314 419.5 1142 20.69 6423 .{2,4}(Tom|Sawyer|Huckleberry|Finn) 237 410.9 1013 18.99 5224 Tom.{10,25}river|river.{10,25}Tom 1 63.17 58.31 18.94 260.2 [a-zA-Z]+ing 10079 203.8 363.8 43.78 1.583e+05 \s[a-zA-Z]{0,12}ing\s 7160 127.1 26.65 34.23 1.114e+05 ([A-Za-z]awyer|[A-Za-z]inn)\s 50 147.6 412.4 21.57 1172 ["'][^"']{0,30}[?!\.]["'] 1618 85.88 86.55 22.22 2.576e+04
And on Jython 2.5.3 with JRE 7:
re str.find
Twain 5 34 3 (?i)Twain 10 251 [a-z]shing 165 564 Huck[a-zA-Z]+|Saw[a-zA-Z]+ 52 281 \b\w+nn\b 32 510 [a-q][^u-z]{13}x 445 1786 Tom|Sawyer|Huckleberry|Finn 314 102 (?i)Tom|Sawyer|Huckleberry|Finn 477 1232 .{0,2}(Tom|Sawyer|Huckleberry|Finn) 314 1345 .{2,4}(Tom|Sawyer|Huckleberry|Finn) 237 1353 Tom.{10,25}river|river.{10,25}Tom 1 305 [a-zA-Z]+ing 10079 1211 \s[a-zA-Z]{0,12}ing\s 7160 571 ([A-Za-z]awyer|[A-Za-z]inn)\s 50 676 ["'][^"']{0,30}[?!\.]["'] 1618 431

this is really difficult to read, can you tell me which column am I looking at?
On Sun, Mar 6, 2016 at 11:21 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 06.03.16 09:14, Maciej Fijalkowski wrote:
Any chance you can rerun this on pypy?
Results on PyPy 2.2.1 (I'm not sure I could build the last PyPy on my computer):
re str.find
Twain 5 5.469 3.852 (?i)Twain 10 8.646 [a-z]shing 165 17.24 Huck[a-zA-Z]+|Saw[a-zA-Z]+ 52 7.763 \b\w+nn\b 32 101 [a-q][^u-z]{13}x 445 167.6 Tom|Sawyer|Huckleberry|Finn 314 8.583 (?i)Tom|Sawyer|Huckleberry|Finn 477 16.3 .{0,2}(Tom|Sawyer|Huckleberry|Finn) 314 270.9 .{2,4}(Tom|Sawyer|Huckleberry|Finn) 237 262 Tom.{10,25}river|river.{10,25}Tom 1 8.461 [a-zA-Z]+ing 10079 348 \s[a-zA-Z]{0,12}ing\s 7160 115.8 ([A-Za-z]awyer|[A-Za-z]inn)\s 50 16.62 ["'][^"']{0,30}[?!\.]["'] 1618 14.45
Alternative regular expression engines need extension modules and don't work on PyPy for me.
For comparison results on CPython 2.7.11+:
re regex re2 pcre str.find
Twain 5 4.423 2.699 8.045 93.4 4.181 (?i)Twain 10 50.07 3.563 20.35 185.6 [a-z]shing 165 98.68 6.365 23.71 2886 Huck[a-zA-Z]+|Saw[a-zA-Z]+ 52 58.97 50.26 19.52 1016 \b\w+nn\b 32 130.1 416.5 18.38 740.7 [a-q][^u-z]{13}x 445 406.6 7.935 5886 7137 Tom|Sawyer|Huckleberry|Finn 314 53.09 59.1 20.33 5377 (?i)Tom|Sawyer|Huckleberry|Finn 477 281.2 338.5 23.77 7895 .{0,2}(Tom|Sawyer|Huckleberry|Finn) 314 419.5 1142 20.69 6423 .{2,4}(Tom|Sawyer|Huckleberry|Finn) 237 410.9 1013 18.99 5224 Tom.{10,25}river|river.{10,25}Tom 1 63.17 58.31 18.94 260.2 [a-zA-Z]+ing 10079 203.8 363.8 43.78 1.583e+05 \s[a-zA-Z]{0,12}ing\s 7160 127.1 26.65 34.23 1.114e+05 ([A-Za-z]awyer|[A-Za-z]inn)\s 50 147.6 412.4 21.57 1172 ["'][^"']{0,30}[?!\.]["'] 1618 85.88 86.55 22.22 2.576e+04
And on Jython 2.5.3 with JRE 7:
re str.find
Twain 5 34 3 (?i)Twain 10 251 [a-z]shing 165 564 Huck[a-zA-Z]+|Saw[a-zA-Z]+ 52 281 \b\w+nn\b 32 510 [a-q][^u-z]{13}x 445 1786 Tom|Sawyer|Huckleberry|Finn 314 102 (?i)Tom|Sawyer|Huckleberry|Finn 477 1232 .{0,2}(Tom|Sawyer|Huckleberry|Finn) 314 1345 .{2,4}(Tom|Sawyer|Huckleberry|Finn) 237 1353 Tom.{10,25}river|river.{10,25}Tom 1 305 [a-zA-Z]+ing 10079 1211 \s[a-zA-Z]{0,12}ing\s 7160 571 ([A-Za-z]awyer|[A-Za-z]inn)\s 50 676 ["'][^"']{0,30}[?!\.]["'] 1618 431
Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
participants (2)
-
Maciej Fijalkowski
-
Serhiy Storchaka