On 07.03.16 19:19, Brett Cannon wrote:
Are you thinking about turning all of this into a benchmark for the benchmark suite?
This was my purpose. I first had written a benchmark for the benchmark suite, then I became interested in more detailed results and a comparison with alternative engines.
There are several questions about a benchmark for the benchmark suite.
Input data is public 20MB text (8MB in ZIP file). Should we download it every time (may be with caching) or add it to the repository?
One iteration of all searches on full text takes 29 seconds on my computer. Isn't this too long? In any case I want first optimize some bottlenecks in the re module.
Do we need one benchmark that gives an accumulated time of all searches, or separate microbenchmarks for every pattern?
Would be nice to use the same benchmark for comparing different regular expression. This requires changing perf.py. May be we could use the same interface to compare ElementTree with lxml and json with simplejson.
Patterns are ASCII-only and the text is mostly ASCII. Would be nice to add non-ASCII pattern and non-ASCII text. But this will increase run time.