RegEx issues
Steve Holden
steve at holdenweb.com
Sat Jan 24 13:50:03 EST 2009
Mark Tolonen wrote:
>
> "Sean Brown" <sbrown.home@[spammy] gmail.com> wrote in message
> news:glflaj$qrf$2 at nntp.motzarella.org...
>> Using python 2.4.4 on OpenSolaris 2008.11
>>
>> I have the following string created by opening a url that has the
>> following string in it:
>>
>> td[ct] = [[ ... ]];\r\n
>>
>> The ... above is what I'm interested in extracting which is really a
>> whole bunch of text. So I think the regex \[\[(.*)\]\]; should do it.
>> The problem is it appears that python is escaping the \ in the regex
>> because I see this:
>>>>> reg = '\[\[(.*)\]\];'
>>>>> reg
>> '\\[\\[(.*)\\]\\];'
>>
>> Now to me looks like it would match the string - \[\[ ... \]\];
>
> You are viewing the repr of the string
>
>>>> reg='\[\[(.*)\]\];'
>>>> reg
> '\\[\\[(.*)\\]\\];'
>>>> print reg
> \[\[(.*)\]\]; <== these are the chars passed to regex
>
> The slashes are telling regex the the [ are literal.
>
>>
>> Which obviously doesn't match anything because there are no literal \ in
>> the above string. Leaving the \ out of the \[\[ above has re.compile
>> throw an error because [ is a special regex character. Which is why it
>> needs to be escaped in the first place.
>>
>> I am either doing something really wrong, which very possible, or I've
>> missed something obvious. Either way, I thought I'd ask why this isn't
>> working and why it seems to be changing my regex to something else.
>
> Did you try it?
>
>>>> s='td[ct] = [[blah blah]];\r\n'
>>>> re.search(reg,s).group(1)
> 'blah blah'
>
Beware, though, that by default regex matches are greedy, so if there's
a chance that two [[ ... ]] [[ ... ]] can appear on the same line then
the above pattern will match
... ]] [[ ...
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-list
mailing list