Regular expression to match a #

John Machin sjmachin at lexicon.net
Fri Aug 12 01:13:05 CEST 2005


Devan L wrote:
> John Machin wrote:
> 
>>Devan L wrote:
>>
>>>John Machin wrote:
>>>
>>>
>>>>Aahz wrote:
>>>>
>>>>
>>>>>In article <42fb45d7$1 at news.eftel.com>,
>>>>>John Machin  <sjmachin at lexicon.net> wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Search for r'^something' can never be better/faster than match for
>>>>>>r'something', and with a dopey implementation of search [which Python's
>>>>>>re is NOT] it could be much worse. So please don't tell newbies to
>>>>>>search for r'^something'.
>>>>>
>>>>>
>>>>>You're somehow getting mixed up in thinking that "^" is some kind of
>>>>>"not" operator -- it's the start of line anchor in this context.
>>>>
>>>>I can't imagine where you got that idea from.
>>>>
>>>>If I change "[which Python's re is NOT]" to "[Python's re's search() is
>>>>not dopey]", does that help you?
>>>>
>>>>The point was made in a context where the OP appeared to be reading a
>>>>line at a time and parsing it, and re.compile(r'something').match()
>>>>would do the job; re.compile(r'^something').search() will do the job too
>>>>-- BECAUSE ^ means start of line anchor -- but somewhat redundantly, and
>>>>very inefficiently in the failing case with dopey implementations of
>>>>search() (which apply match() at offsets 0, 1, 2, .....).
>>>
>>>
>>>I don't see much difference.
>>
>>and I didn't expect that you would -- like I wrote above: "Python's re's
>>search() is not dopey".
> 
> 
> Your wording makes it hard to distinguish what exactly is "dopey".
> 

"""
dopey implementations of search() (which apply match() at offsets 0, 1, 
2, .....).
"""

The "dopiness" is that the ^ operator means that the pattern cannot 
possibly match starting at 1, 2, 3, etc but a non-optimised search will 
not recognise that and will try all possibilities, so the failing case 
takes time dependant on the length of the string.



More information about the Python-list mailing list