Am 13.07.2010 15:35, schrieb Antoine Pitrou:
On Tue, 13 Jul 2010 15:20:23 +0100 Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 13/07/2010 15:17, Reid Kleckner wrote:
On Mon, Jul 12, 2010 at 2:07 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:
MRAB's module offers a superset of re's features rather than a subset though, so once it has had more of a chance to bake on PyPI it may be worth another look.
I feel like the new module is designed to replace the current re module, and shouldn't need to spend time in PyPI. A faster regex library isn't going to motivate users to add external dependencies to their projects.
If the backwards compatibility issues can be addressed and MRAB is willing to remain as maintainer then the advantages seem well worth it to me.
To me as well. The code needs a full review before integrating, though.
FWIW, I've now run the Pygments test suite (Pygments has about 2500 regular expressions that are exercised there) and only had two problems: * Scoped flags: A few lexers use (?s) and similar flags at the end of the expression, which has no effect in regex currently. * POSIX character classes: One regex used a class '[][:xyz]', so the [: was seen as the start of a character class. I'm not sure how common this is, as most people seem to escape brackets in character classes. Also, it gives a clear error on regex.compile(), not "mysterious" failures. Timings (seconds to run the test suite): re 26.689 26.015 26.008 regex 26.066 25.797 25.865 So, I thought there wasn't a difference in performance for this use case (which is compiling a lot of regexes and matching most of them only a few times in comparison). However, I found that looking at the regex caching is very important in this case: re._MAXCACHE is by default set to 100, and regex._MAXCACHE to 1024. When I set re._MAXCACHE to 1024 before running the test suite, I get times around 18 (!) seconds for re. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.