Add regex pattern literal p""
We can use this literal to represent a compiled pattern, for example:
p"(?i)[a-z]".findall("a1B2c3") ['a', 'B', 'c']
compiled = p"(?<=abc)def" m = compiled.search('abcdef') m.group(0) 'def'
rp'\W+'.split('Words, words, words.') ['Words', 'words', 'words', '']
This allows peephole optimizer to store compiled pattern in .pyc file, we can get performance optimization like replacing constant set by frozenset in .pyc file. Then such issue [1] can be solved perfectly. [1] Optimize base64.b16decode to use compiled regex [1] https://bugs.python.org/issue35559 Two shortcomings: 1, Elevating a class in a module (re.Pattern) to language level, this sounds not very natural. This makes Python looks like Perl. 2, We can't use regex module as a drop-in replacement: import regex as re IMHO, I would like to see regex module be adopted into stdlib after cutting off its "full case-folding" and "fuzzy matching" features. Related links: [2] Chris Angelico conceived of "compiled regexes be stored in .pyc file" in March 2013. [2] https://mail.python.org/pipermail/python-ideas/2013-March/020043.html [3] Ken Hilton conceived of "Give regex operations more sugar" in June 2018. [3] https://mail.python.org/pipermail/python-ideas/2018-June/051395.html
On Thu, Dec 27, 2018 at 10:49 PM Ma Lin
We can use this literal to represent a compiled pattern, for example:
p"(?i)[a-z]".findall("a1B2c3") ['a', 'B', 'c']
compiled = p"(?<=abc)def" m = compiled.search('abcdef') m.group(0) 'def'
rp'\W+'.split('Words, words, words.') ['Words', 'words', 'words', '']
This allows peephole optimizer to store compiled pattern in .pyc file, we can get performance optimization like replacing constant set by frozenset in .pyc file.
Before discussing something specific like regex literal syntax, I would love to see a way to measure that sort of performance difference. Does anyone here have MacroPy experience or something and could mock something up that would precompile and save a regex? In theory, it would be possible to tag ANY value as "constant once evaluated" and have it saved in the pyc. It'd be good to know just how much benefit this precompilation actually grants.
[2] Chris Angelico conceived of "compiled regexes be stored in .pyc file" in March 2013. [2] https://mail.python.org/pipermail/python-ideas/2013-March/020043.html
Wow that's an old post of mine :) ChrisA
It'd be good to know just how much benefit this precompilation actually grants.
As far as I know, Pattern objects in regex module can be pickled, don't know if it's useful.
import pickle import regex p = regex.compile('[a-z]') b = pickle.dumps(p) p = pickle.loads(b)
Wow that's an old post of mine I searched on Google before post this, hope there is no omission.
Ma Lin schrieb am 27.12.18 um 14:15:
It'd be good to know just how much benefit this precompilation actually grants.
As far as I know, Pattern objects in regex module can be pickled, don't know if it's useful.
import pickle import regex
That's from the external regex package, not the stdlib re module.
p = regex.compile('[a-z]') b = pickle.dumps(p) p = pickle.loads(b)
Look a little closer:
import pickle, re p = re.compile("[abc]") pickle.dumps(p) b'\x80\x03cre\n_compile\nq\x00X\x05\x00\x00\x00[abc]q\x01K \x86q\x02Rq\x03.'
What this does, essentially, is to make the pickle loader pass the original regex pattern string into re.compile() to "unpickle" it. Meaning, it compiles the regex on the way in. Thus, there isn't much to gain from using (the current form of) regex pickling here. I'm not saying that this can't be changed, but personally, this is exactly what I would do if I was asked to make a compiled regex picklable. Everything else would probably get you into portability hell. Stefan
Reply to Stefan Behnel and Chris Angelico. On 18-12-27 22:42, Stefan Behnel wrote:
>>> import pickle, re >>> p = re.compile("[abc]") >>> pickle.dumps(p) b'\x80\x03cre\n_compile\nq\x00X\x05\x00\x00\x00[abc]q\x01K \x86q\x02Rq\x03.'
What this does, essentially, is to make the pickle loader pass the original regex pattern string into re.compile() to "unpickle" it. Meaning, it compiles the regex on the way in. Thus, there isn't much to gain from using (the current form of) regex pickling here.
Yes, re module only pickles pattern string and flags, it's safe for cross-version pickle/unpickle. re module's pickle code: def _pickle(p): return _compile, (p.pattern, p.flags) copyreg.pickle(Pattern, _pickle, _compile) On 18-12-28 1:27, Chris Angelico wrote:
What Stefan pointed out regarding the stdlib's "re" module is also true of the third party "regex" - unpickling just compiles from the original string.
I had followed regex module for a year, it does pickle the compiled data, this is its code: def _pickle(pattern): return _regex.compile, pattern._pickled_data _copy_reg.pickle(Pattern, _pickle) // in _regex.c file self->pickled_data = Py_BuildValue("OnOOOOOnOnn", pattern, flags, code_list, groupindex, indexgroup, named_lists, named_list_indexes, req_offset, required_chars, req_flags, public_group_count); if (!self->pickled_data) { Py_DECREF(self); return NULL; }
On Fri, Dec 28, 2018 at 12:15 AM Ma Lin
It'd be good to know just how much benefit this precompilation actually grants.
As far as I know, Pattern objects in regex module can be pickled, don't know if it's useful.
import pickle import regex p = regex.compile('[a-z]') b = pickle.dumps(p) p = pickle.loads(b)
What Stefan pointed out regarding the stdlib's "re" module is also true of the third party "regex" - unpickling just compiles from the original string. Regarding pyc files, though, pickle is less significant than marshal. And both re.compile() and regex.compile() return unmarshallable objects. Fortunately, marshal doesn't need to produce cross-compatible files, so the portability issues don't apply. So, let's suppose that marshalling a compiled regex became possible. It would need to be (a) absolutely guaranteed to have the same effect as compiling the original text string, and (b) faster than compiling the original text string, otherwise it's useless. This is where testing would be needed: can it actually save any significant amount of time?
Wow that's an old post of mine I searched on Google before post this, hope there is no omission.
You're absolutely fine :) I was amused to find that a post of mine from nearly six years ago should be the most notable on the subject, is all. Good work digging it up. ChrisA
We can use this literal to represent a compiled pattern, for example:
p"(?i)[a-z]".findall("a1B2c3") ['a', 'B', 'c']
There are some other advantages to this. For me the most interesting is that we can know from code easier that something is a regex. For my mutation tester mutmut I have an experimental regex mutation system but it just feels wrong to write hacky heuristics to guess if a string is a regex. And it's complicated to look at too much context (although I'm working on ways to make that type of thing radically nicer to do). It would be much nicer if I could just know based on the AST node type. I guess the same goes for static analyzers. / Anders
On 2018-12-27 11:48, Ma Lin wrote: [snip]
2, We can't use regex module as a drop-in replacement: import regex as re IMHO, I would like to see regex module be adopted into stdlib after cutting off its "full case-folding" and "fuzzy matching" features.
I think that omitting full casefolding would be a bad idea; after all, strings (in Python 3) have a .casefold method.
On Thu, Dec 27, 2018 at 05:47:46PM +0000, MRAB wrote:
On 2018-12-27 11:48, Ma Lin wrote: [snip]
2, We can't use regex module as a drop-in replacement: import regex as re IMHO, I would like to see regex module be adopted into stdlib after cutting off its "full case-folding" and "fuzzy matching" features.
I think that omitting full casefolding would be a bad idea; after all, strings (in Python 3) have a .casefold method.
And I don't understand why omitting fuzzy matching is a good idea. If you don't want fuzzy matching, don't use it in your code. But why remove it? -- Steve
Maybe this literal will encourage people to finish tasks using regex, even lead to abuse regex, will this change Python's style? What's worse is, people using mixed manners in the same project: one_line.split(',') ... p','.split(one_line) Maybe it will break the Python's style, reduce code readability, is this worth it?
On Fri, Dec 28, 2018 at 1:56 AM Ma Lin
Maybe this literal will encourage people to finish tasks using regex, even lead to abuse regex, will this change Python's style?
What's worse is, people using mixed manners in the same project:
one_line.split(',') ... p','.split(one_line)
Maybe it will break the Python's style, reduce code readability, is this worth it?
The bar for introducing a new type of literal should be very high. Do performance numbers show this change would have a large impact for a large amount of libraries and programs? In my opinion, only if this change would make 50% of programs run 50% faster then it might be worth discussing. The damage to readability, burden of changing syntax and burden of yet another language feature for newcomers to learn is too high. Cheers, Yuval
On Mon, Dec 31, 2018 at 12:48:56AM -0800, Yuval Greenfield wrote:
In my opinion, only if this change would make 50% of programs run 50% faster then it might be worth discussing.
What if it were 100% of programs 25% faster? *wink* Generally speaking, we don't introduce new syntax as a speed optimization. The main reasons to introduce syntax is for convenience and to improve the expressiveness of code. That's why we usually prefer to use operators like + and == instead of functions add() and equal(). There's nothing a list comprehension can do that a for-loop can't, but list comps are often more expressive. And the class statement is just syntactic sugar for type(name, bases, dict), but much more convenient. In this specific case, I don't think that regex literals will add much expressiveness: regex = re.compile(r"...") regex = p("...") is not that much different. -- Steve
regex = re.compile(r"...") regex = p("...")
is not that much different.
True, but when the literal is put somewhere far from the compile() call it becomes a problem for static analysis. Conceptually a regex is not a string but an embedded foreign language. That's why I think this discussion is worth having. It would be nice with a way to mark up foreign languages in a way that had some other advantages so people would be incentivised to do it, but just a way to mark it with comments would be fine too I think if it's standardized. Maybe the discussion should be expanded to cover the general case of embedded foreign languages? SQL, HTML, CSS and (obviously) regex comes to mind. One could also think of C for stuff like CFFI. / Anders
I am a full -1 on this idea -
Two shortcomings:
1, Elevating a class in a module (re.Pattern) to language level, this sounds not very natural. This makes Python looks like Perl.
2, We can't use regex module as a drop-in replacement: import regex as re IMHO, I would like to see regex module be adopted into stdlib after cutting off its "full case-folding" and "fuzzy matching" features.
Sorry for sounding over-reactive, but yes, this could make Python look
like Perl.
I think one full advantage of Python is exactly that regexps are
treated fairly, with
no special syntax. You call a function, or build an instance, and have
the regex power,
and that is it. And you can just plug any third-party regex module, and it will
work just like the one that is built-in the language.
This proposal at least keep the ' " ' quotes - so we don't end up like
Javascript which has a "squeashy regexy" thing that can sneak in
code and you are never sure when it is run, or even if it can be assigned
to a variable at all.
I am quite sure that if the mater is performance, a way to pickle, or
somehow store pre-compiled regexes can be found without requiring
special syntax.
And a 3rd shortcoming - flags can't be passed as parameters, and have
to be built-in the regexp themselves, further complicating the readability even
for very simple regular expressions.
Other than that it would not be much different from the ' f" ' strings
thing, indeed,
On Thu, 27 Dec 2018 at 09:49, Ma Lin
Related links:
[2] Chris Angelico conceived of "compiled regexes be stored in .pyc file" in March 2013. [2] https://mail.python.org/pipermail/python-ideas/2013-March/020043.html
[3] Ken Hilton conceived of "Give regex operations more sugar" in June 2018. [3] https://mail.python.org/pipermail/python-ideas/2018-June/051395.html
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 18-12-28 22:54, Joao S. O. Bueno wrote:
Sorry for sounding over-reactive, but yes, this could make Python look like Perl. Yes, this may introduce Perl's style irreversibly, we need to be cautious about this.
I'm thinking, if people ask these questions in their mind when reading a piece of Python code: 1, "Is this Python code?" 2, "What's the purpose of this code?" 3, "How can I modify it if I want to ... ?" Maybe Python is on a doubtful way. There is an interesting question: Will literal p"" ruin Python's (or other dynamic languages like Ruby) style? Why will this happen?
And a 3rd shortcoming - flags can't be passed as parameters, and have to be built-in the regexp themselves, further complicating the readability even for very simple regular expressions. IMO this is an advantage, it's hard to omit flags when reading/copying an regex pattern.
for regular strings one can write "aaa" + "bbb" which also works for f-strings, r-strings, etc.; in regular expressions, there is, e.g., parameter counting and references to numbered matches. How would that be dealt with in a compound p-string? Either it would have to re-compiled or not, either way could lead to unexpected results p"(\d)\1" + p"(\s)\1" or p"^(\w)" + p"^(\d)" regular strings can be added, bu the results of p-string could not - well, their are not strings. This brings me to the point that the key difference is that f- and r- strings actually return strings, whereas p- string would return a different kind of object. That would seem certainly very confusing to novices - and also for the language standard as a whole. -Alexander
On Sat, Dec 29, 2018 at 04:29:32PM +1100, Alexander Heger wrote:
for regular strings one can write
"aaa" + "bbb"
which also works for f-strings, r-strings, etc.; in regular expressions, there is, e.g., parameter counting and references to numbered matches. How would that be dealt with in a compound p-string? Either it would have to re-compiled or not, either way could lead to unexpected results
What does Perl do?
p"(\d)\1" + p"(\s)\1"
Since + is used for concatenation, then that would obviously be the same as: p"(\d)\1(\s)\1" Whether it gets done at compile-time or run-time depends on how smart the keyhole optimiser is. If it is smart enough to recognise regex literals, it could fold the two strings together and regex-compile them at python-compile time, otherwise it could be equivalent to: _t1 = re.compile(r"(\d)\1") # compile-time _t2 = re.compile(r"(\s)\1") # compile-time re.compile(_t1.pattern + _t2.pattern) # run-time Obviously that defeats the purpose of using a p"" pre-compiled regex object, but the answer to that is either: 1. Don't do that then; or 2. We better make sure the keyhole optimizer is smarter. Or we just ban concatenation. "P-strings" aren't strings, even though they look like them.
This brings me to the point that the key difference is that f- and r- strings actually return strings,
To be precise, f-"strings" are actually code that returns a string when executed at runtime; r-strings are literal syntax for strings.
whereas p- string would return a different kind of object. That would seem certainly very confusing to novices - and also for the language standard as a whole.
Indeed. Perhaps something like \\regex\\ would be better, *if* this feature is desired. -- Steve
Steven D'Aprano wrote:
_t1 = re.compile(r"(\d)\1") # compile-time _t2 = re.compile(r"(\s)\1") # compile-time re.compile(_t1.pattern + _t2.pattern) # run-time
It would be weird if p"(\d)\1" + p"(\s)\1" worked but re.compile(r"(\d)\1") + re.compile(r"(\s)\1") didn't. -- Greg
On Sat, Dec 29, 2018 at 12:30 AM Alexander Heger
for regular strings one can write
"aaa" + "bbb"
which also works for f-strings, r-strings, etc.; in regular expressions, there is, e.g., parameter counting and references to numbered matches. How would that be dealt with in a compound p-string? Either it would have to re-compiled or not, either way could lead to unexpected results
p"(\d)\1" + p"(\s)\1"
or
p"^(\w)" + p"^(\d)"
regular strings can be added, bu the results of p-string could not - well, their are not strings.
Isn't this a feature, not a bug, of encouraging literals to be specified as patterns: addition of patterns would raise an error (as is currently the case for addition of compiled patterns in the re and regex modules)? Currently, I find it easiest to use r-strings for patterns and call re.search() etc. without precompiling them, which means that I could accidentally concatenate two patterns together that would silently produce an unmatchable pattern. Using p-literals for most patterns would mean I have to be explicit in the exceptional case where I do want to assemble a pattern from multiple parts: FIRSTNAME = p"[A-Z][-A-Za-z']+" LASTNAME = p"[-A-Za-z']([-A-Za-z' ]+[-A-Za-z'])?" FULLNAME = FIRSTNAME + p' ' + LASTNAME # error FIRSTNAME = r"[A-Z][-A-Za-z']+" LASTNAME = r"[-A-Za-z']([-A-Za-z' ]+[-A-Za-z'])?" FULLNAME = re.compile(FIRSTNAME + ' ' + LASTNAME) # success Another potential advantage is that an ill-formed p-literal (such as a mismatched parenthesis) would be caught immediately, rather than when it is first used. This could pay off, for example, if I am defining a data structure with a bunch of regexes that would get used for different input. (But there may be performance tradeoffs here.)
This brings me to the point that the key difference is that f- and r- strings actually return strings, whereas p- string would return a different kind of object. That would seem certainly very confusing to novices - and also for the language standard as a whole.
The b prefix produces a bytes literal. Is a bytes object a kind of string, more so than a regex pattern is? I could see an argument that bytes is a particular encoding of sequential character data, whereas a regex pattern represents a string *language*, i.e. an abstraction over string data. But...this distinction starts to feel very theoretical rather than practical. If novices are expected to read code with regular expressions in it, why would they have trouble understanding that the "p" prefix means "pattern"? As someone who works with text a lot, I think there's a decent practicality-beats-purity argument in favor of p-literals, which would make regex operations more easily accessible and prevent patterns from being mixed up with string data. A potential downside, though, is that it will be tempting to introduce flags as prefixes, too. Do we want to go down the road of pui"my Unicode-compatible case-insensitive pattern"? Nathan
-Alexander
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I don't see a justification for baking REs into the syntax of Python. In the Python world, REs are just one tool in a toolbox containing a great many tools. What's more, it's a tool that should be used with considerable reluctance, because REs are essentially unreadable, so every time you use one you're creating a maintenance headache. This quality is quite the opposite of what one would expect from a core language feature. -- Greg
What's more, it's a tool that should be used with considerable reluctance, because REs are essentially unreadable, so every time you use one you're creating a maintenance headache.
Well, it requires some experience to read REs, I have written many, and I still need to test thoroughly even many basic ones for that they really do what they are supposed to do. And then there is the issue that there is many different implementation, what you have to escape, etc., varies between python (raw and regular strings), emacs, grep, overleaf, ... Never mind, my main point is that they return an object that is qualitatively different from a string, for example, in terms of concatenation. I also think it is too specialised, and time-critical constant REs can be stored in the module body, etc., if need be. I do that. But since this is the ideas mailing list, and taking this thread on an excursion, maybe an "addition" operator could be defined for REs, such that re.compile(s1 + s1) == re.compile(s1) + re.compile(s2) with the restriction that s1 and s2 are strings that are valid REs each. Even that would leave questions about how to deal with compile flags; they probably should be treated the same as if they were embedded at the beginning of each string. -Alexander
I have a compromise idea, here is some points: 1, Create a built-in class `pattern_str` which is a subclass of `str`, it's dedicated to regex pattern string. 2, Use p"" to represent `pattern_str`. Some advantages: 1, Since it's a subclass of `str`, we can use it as normal `str`. 2, IDE/linter/compiler can identify it as an regex pattern, something like type hint in language level. 3, We can still store compiled pattern in .pyc file *quietly*. 4, Won't introduce Perl style into Python, to avoid abusing regex in some degree. We still using regex in the old way: import re re.search(p"(?i)[a-z]", s) But if re.search() find the pattern is a `pattern_str`, it load compiled pattern from .pyc file directly.
On Thu, 27 Dec 2018 19:48:40 +0800
Ma Lin
We can use this literal to represent a compiled pattern, for example:
p"(?i)[a-z]".findall("a1B2c3") ['a', 'B', 'c']
compiled = p"(?<=abc)def" m = compiled.search('abcdef') m.group(0) 'def'
rp'\W+'.split('Words, words, words.') ['Words', 'words', 'words', '']
This allows peephole optimizer to store compiled pattern in .pyc file, we can get performance optimization like replacing constant set by frozenset in .pyc file.
Then such issue [1] can be solved perfectly. [1] Optimize base64.b16decode to use compiled regex [1] https://bugs.python.org/issue35559
The simple solution to the perceived performance problem (not sure how much of a problem it is in real life) is to have a stdlib function that lazily-compiles a regex (*). Just like "re.compile", but lazy: you don't bear the cost of compiling when simply importing the module, but once the pattern is compiled, there is no overhead for looking up a global cache dict. No need for a dedicated literal. (*) Let's call it "re.pattern", for example. Regards Antoine.
On 31.12.2018 12:23, Antoine Pitrou wrote:
On Thu, 27 Dec 2018 19:48:40 +0800 Ma Lin
wrote: We can use this literal to represent a compiled pattern, for example:
p"(?i)[a-z]".findall("a1B2c3") ['a', 'B', 'c']
compiled = p"(?<=abc)def" m = compiled.search('abcdef') m.group(0) 'def'
rp'\W+'.split('Words, words, words.') ['Words', 'words', 'words', '']
This allows peephole optimizer to store compiled pattern in .pyc file, we can get performance optimization like replacing constant set by frozenset in .pyc file.
Then such issue [1] can be solved perfectly. [1] Optimize base64.b16decode to use compiled regex [1] https://bugs.python.org/issue35559
The simple solution to the perceived performance problem (not sure how much of a problem it is in real life) is to have a stdlib function that lazily-compiles a regex (*). Just like "re.compile", but lazy: you don't bear the cost of compiling when simply importing the module, but once the pattern is compiled, there is no overhead for looking up a global cache dict.
No need for a dedicated literal.
(*) Let's call it "re.pattern", for example.
No need for a new function :-) We already have re.search() and re.match() which deal with compilation on-the-fly and caching. Perhaps the documentation should hint at this more explicitly... https://docs.python.org/3.7/library/re.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 31 2018)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
Le 31/12/2018 à 12:31, M.-A. Lemburg a écrit :
We already have re.search() and re.match() which deal with compilation on-the-fly and caching. Perhaps the documentation should hint at this more explicitly...
The complaint is that the global cache is still too costly. See measurements in https://bugs.python.org/issue35559 Regards Antoine.
On 18-12-31 19:47, Antoine Pitrou wrote:
The complaint is that the global cache is still too costly. See measurements in https://bugs.python.org/issue35559
In this issue, using a global variable `_has_non_base16_digits` [1] will accelerate 30%. Is re module's internal cache [2] so bad? If rewrite re module's cache with C and use a custom data structure, maybe we will get a small speedup. [1] `_has_non_base16_digits` in PR11287 [1] https://github.com/python/cpython/pull/11287/files [2] re module's internal cache code: [2] https://github.com/python/cpython/blob/master/Lib/re.py#L268-L295 _cache = {} # ordered! _MAXCACHE = 512 def _compile(pattern, flags): # internal: compile pattern if isinstance(flags, RegexFlag): flags = flags.value try: return _cache[type(pattern), pattern, flags] except KeyError: pass ...
Ma Lin schrieb am 31.12.18 um 14:02:
On 18-12-31 19:47, Antoine Pitrou wrote:
The complaint is that the global cache is still too costly. See measurements in https://bugs.python.org/issue35559
In this issue, using a global variable `_has_non_base16_digits` [1] will accelerate 30%. Is re module's internal cache [2] so bad?
If rewrite re module's cache with C and use a custom data structure, maybe we will get a small speedup.
[1] `_has_non_base16_digits` in PR11287 [1] https://github.com/python/cpython/pull/11287/files
[2] re module's internal cache code: [2] https://github.com/python/cpython/blob/master/Lib/re.py#L268-L295
_cache = {} # ordered! _MAXCACHE = 512 def _compile(pattern, flags): # internal: compile pattern if isinstance(flags, RegexFlag): flags = flags.value try: return _cache[type(pattern), pattern, flags] except KeyError: pass ...
I wouldn't be surprised if the slowest part here was the isinstance() check. Maybe the RegexFlag class could implement "__hash__()" as "return hash(self.value)" ? Stefan
On 19-1-1 21:39, Stefan Behnel wrote:
I wouldn't be surprised if the slowest part here was the isinstance() check. Maybe the RegexFlag class could implement "__hash__()" as "return hash(self.value)" ?
Apply this patch: def _compile(pattern, flags): # internal: compile pattern - if isinstance(flags, RegexFlag): - flags = flags.value + try: + flags = int(flags) + except: + pass try: return _cache[type(pattern), pattern, flags] except KeyError: Then run this benchmark on my Raspberry Pi 3B: import perf runner = perf.Runner() runner.timeit(name="compile_re", stmt="re.compile(b'[^0-9A-F]')", setup="import re") Mean +- std dev: [a] 7.71 us +- 0.09 us -> [b] 6.74 us +- 0.10 us: 1.14x faster (-13%) Looks great.
participants (14)
-
Alexander Heger
-
Anders Hovmöller
-
Antoine Pitrou
-
Antoine Pitrou
-
Chris Angelico
-
Greg Ewing
-
Joao S. O. Bueno
-
M.-A. Lemburg
-
Ma Lin
-
MRAB
-
Nathan Schneider
-
Stefan Behnel
-
Steven D'Aprano
-
Yuval Greenfield