regular expressions and internationalization (WAS: permuting letters...)
Andrew Dalke
adalke at mindspring.com
Wed Nov 17 17:35:17 EST 2004
Steven Bethard <steven.bethard at gmail.com> writes on Fri, 12 Nov 2004
20:15:28 +0000 (UTC):
>Is there any way to match \w but not \d?
Dieter Maurer wrote:
> It is: r'(?!\d)\w'
While implementation are free to optimize this case, the current
Python implementation is slower than the other solution of r"[^\d\W]"
>>> text = "Blah an123d blah901234 9spam and eggs\n" * 1000
>>> import re
>>> pat1 = re.compile(r"((?!\d)\w)+")
>>> pat2 = re.compile(r"[^\d\W]+")
>>> len(pat2.findall(text))
7000
>>> len(pat1.findall(text))
7000
>>> import timeit
>>> x = timeit.Timer(setup = "import __main__ as M",
stmt = "M.pat1.findall(M.text)")
>>> x.timeit(100)
4.0506279468536377
>>> x = timeit.Timer(setup = "import __main__ as M",
stmt = "M.pat2.findall(M.text)")
>>> x.timeit(100)
1.8287069797515869
>>>
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list