[issue2650] re.escape should not escape underscore
SilentGhost
report at bugs.python.org
Mon Mar 14 15:46:48 CET 2011
SilentGhost <ghost.adh at gmail.com> added the comment:
I think these are two different questions:
1. What to escape
2. What to do about poor performance of the re.escape when re.sub is used
In my opinion, there isn't any justifiable reason to escape non-meta characters: it doesn't affect matching; escaped strings are typically just re-used in regex.
I would favour simpler and cleaner code with re.sub. I don't think that re.quote could be a performance bottleneck in any application. I did some profiling with python3.2 and it seems that the reason for this poor performance is many abstraction layers when using re.sub. However, we need to bear in mind that we're only talking about 40 usec difference for a 100-char string (string.printable): I'd think that strings being escaped are typically shorter.
As a compromise, I tested this code:
_mp = {ord(i): '\\' + i for i in '][.^$*+?{}\\|()'}
def escape(pattern):
if isinstance(pattern, str):
return pattern.translate(_mp)
return sub(br'([][.^$*+?{}\\|()])', br'\\\1', pattern)
which is fast (faster than existing code) for str and slow for bytes patterns.
I don't particularly like it, because of the difference between str and bytes handling, but I do think that it will be much easier to "fix" once/when/if re module is improved.
----------
keywords: -patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2650>
_______________________________________
More information about the Python-bugs-list
mailing list