RegEx issues

Sat Jan 24 13:50:03 EST 2009

Mark Tolonen wrote:
> 
> "Sean Brown" <sbrown.home@[spammy] gmail.com> wrote in message
> news:glflaj$qrf$2 at nntp.motzarella.org...
>> Using python 2.4.4 on OpenSolaris 2008.11
>>
>> I have the following string created by opening a url that has the
>> following string in it:
>>
>> td[ct] = [[ ... ]];\r\n
>>
>> The ...  above is what I'm interested in extracting which is really a
>> whole bunch of text. So I think the regex \[\[(.*)\]\]; should do it.
>> The problem is it appears that python is escaping the \ in the regex
>> because I see this:
>>>>> reg = '\[\[(.*)\]\];'
>>>>> reg
>> '\\[\\[(.*)\\]\\];'
>>
>> Now to me looks like it would match the string - \[\[ ... \]\];
> 
> You are viewing the repr of the string
> 
>>>> reg='\[\[(.*)\]\];'
>>>> reg
> '\\[\\[(.*)\\]\\];'
>>>> print reg
> \[\[(.*)\]\];        <== these are the chars passed to regex
> 
> The slashes are telling regex the the [ are literal.
> 
>>
>> Which obviously doesn't match anything because there are no literal \ in
>> the above string. Leaving the \ out of the \[\[ above has re.compile
>> throw an error because [ is a special regex character. Which is why it
>> needs to be escaped in the first place.
>>
>> I am either doing something really wrong, which very possible, or I've
>> missed something obvious. Either way, I thought I'd ask why this isn't
>> working and why it seems to be changing my regex to something else.
> 
> Did you try it?
> 
>>>> s='td[ct] = [[blah blah]];\r\n'
>>>> re.search(reg,s).group(1)
> 'blah blah'
> 
Beware, though, that by default regex matches are greedy, so if there's
a chance that two [[ ... ]] [[ ... ]] can appear on the same line then
the above pattern will match

  ... ]] [[ ...

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/