Regular expression to match a #
John Machin
sjmachin at lexicon.net
Thu Aug 11 19:07:04 EDT 2005
John Machin wrote:
> Devan L wrote:
>
>> John Machin wrote:
>>
>>> Aahz wrote:
>>>
>>>> In article <42fb45d7$1 at news.eftel.com>,
>>>> John Machin <sjmachin at lexicon.net> wrote:
>>>>
>>>>
>>>>> Search for r'^something' can never be better/faster than match for
>>>>> r'something', and with a dopey implementation of search [which
>>>>> Python's
>>>>> re is NOT] it could be much worse. So please don't tell newbies to
>>>>> search for r'^something'.
>>>>
>>>>
>>>>
>>>> You're somehow getting mixed up in thinking that "^" is some kind of
>>>> "not" operator -- it's the start of line anchor in this context.
>>>
>>>
>>> I can't imagine where you got that idea from.
>>>
>>> If I change "[which Python's re is NOT]" to "[Python's re's search() is
>>> not dopey]", does that help you?
>>>
>>> The point was made in a context where the OP appeared to be reading a
>>> line at a time and parsing it, and re.compile(r'something').match()
>>> would do the job; re.compile(r'^something').search() will do the job too
>>> -- BECAUSE ^ means start of line anchor -- but somewhat redundantly, and
>>> very inefficiently in the failing case with dopey implementations of
>>> search() (which apply match() at offsets 0, 1, 2, .....).
>>
>>
>>
>> I don't see much difference.
>
>
> and I didn't expect that you would -- like I wrote above: "Python's re's
> search() is not dopey".
*ahem*
C:\junk>python
Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t1 = timeit.Timer('re.search("^\w"," will not work")','import re')
>>> t2 = timeit.Timer('re.match("\w"," will not work")','import re')
>>> t3 = timeit.Timer('obj(" will not work")','import
re;obj=re.compile("^\w").s
earch')
>>> t4 = timeit.Timer('obj(" will not work")','import
re;obj=re.compile("\w").ma
tch')
>>> t5 = timeit.Timer('obj(" will not work
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq")'
,'import re;obj=re.compile("^\w").search')
>>> t6 = timeit.Timer('obj(" will not work
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq")'
,'import re;obj=re.compile("\w").match')
>>> ["%.3f" % t.timeit() for t in t1, t2, t3, t4]
['5.510', '4.835', '1.588', '1.178']
>>> ["%.3f" % t.timeit() for t in t1, t2, t3, t4]
['5.512', '4.808', '1.584', '1.170']
Observation: factoring out the compile step makes the difference much
more apparent.
>>> ["%.3f" % t.timeit() for t in t3, t4, t5, t6]
['1.578', '1.175', '2.283', '1.174']
>>> ["%.3f" % t.timeit() for t in t3, t4, t5, t6]
['1.582', '1.179', '2.284', '1.172']
>>>
Conclusion: search time depends on length of searched string.
Meta-conclusion: Either I have to retract my
based-on-hope-rather-than-on-experimentation assertion, or redefine "not
dopey" to mean "surely nobody would search for ^x when match x would do,
so it would be dopey to optimise re for that" :-)
So, back to the original point:
If re.match("something") does the job you want, don't use
re.search("^something") instead.
More information about the Python-list
mailing list