Why re.match()?

kj no.email at please.post
Thu Jul 2 07:19:40 EDT 2009


In <pan.2009.07.02.04.14.35 at REMOVE.THIS.cybersource.com.au> Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> writes:

>On Thu, 02 Jul 2009 03:49:57 +0000, kj wrote:

>> In <Xns9C3BCA27ABC36duncanbooth at 127.0.0.1> Duncan Booth
>> <duncan.booth at invalid.invalid> writes:
>>>So, for example:
>> 
>>>>>> re.compile("c").match("abcdef", 2)
>>><_sre.SRE_Match object at 0x0000000002C09B90>
>>>>>> re.compile("^c").search("abcdef", 2)
>>>>>>
>>>>>>
>> I find this unconvincing; with re.search alone one could simply do:
>> 
>>>>> re.compile("^c").search("abcdef"[2:])
>> <_sre.SRE_Match object at 0x75918>
>> 
>> No need for re.match(), at least as far as your example shows.

>Your source string "abcdef" is tiny. Consider the case where the source 
>string is 4GB of data. You want to duplicate the whole lot, minus two 
>characters. Not so easy now.

I'm sure that it is possible to find cases in which the *current*
implementation of re.search() would be inefficient, but that's
because this implementation is perverse, which, I guess, is ultimately
the point of my original post.  Why privilege the special case of
a start-of-string anchor?  What if you wanted to apply an end-anchored
pattern to some prefix of your 4GB string?  Why not have a special
re method for that?  And another for every possible special case?

If the concern is efficiency for such cases, then simply implement
optional offset and length parameters for re.search(), to specify
any arbitrary substring to apply the search to.  To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism, and plain-old bizarre.  Maybe
there's a really good reason for it, but it has not been mentioned
yet.

kj



More information about the Python-list mailing list