On Sun, Jul 11, 2010 at 7:19 PM, anatoly techtonik <techtonik@gmail.com> wrote:
On Fri, Jul 9, 2010 at 6:59 PM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
While the re2 comparison might be interesting from an abstract standpoint, it intentionally supports a different regex language from Python so that it can run faster and use less memory. Since re2 can never replace Python's re module, it doesn't make sense to hold MRAB's new module to that standard.
re2 comparison is interesting from the point of if it should be included in stdlib.
No it isn't. re2 is a wrapper for a third party engine that surrenders some functionality in the pursuit of better bounds on memory and CPU usage. It is not a drop-in replacement for re and cannot be by design: "The one significant exception is that RE2 drops support for backreferences and generalized zero-width assertions, because they cannot be implemented efficiently." There is no reason to have two distinct regex engines in the standard library - if someone knows enough to realise they need the performance assurances of re2, they're also likely to be able to find the Python wrappers for it. regex is potentially interesting for the standard library as it *is* intended to be a drop-in replacement for re that trades longer compilation times (typically once per application) for faster match times (potentially many times per application). The performance of re2 has nothing to do with the comparison between the current re module and MRAB's regex module. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia