Trouble with regular expressions

MRAB google at mrabarnett.plus.com
Sat Feb 7 18:15:05 EST 2009


John Machin wrote:
> On Feb 8, 1:37 am, MRAB <goo... at mrabarnett.plus.com> wrote:
>> LaundroMat wrote:
>>> Hi,
>>> I'm quite new to regular expressions, and I wonder if anyone here
>>> could help me out.
>>> I'm looking to split strings that ideally look like this: "Update: New
>>> item (Household)" into a group.
>>> This expression works ok: '^(Update:)?(.*)(\(.*\))$' - it returns
>>> ("Update", "New item", "(Household)")
>>> Some strings will look like this however: "Update: New item (item)
>>> (Household)". The expression above still does its job, as it returns
>>> ("Update", "New item (item)", "(Household)").
> 
> Not quite true; it actually returns
>     ('Update:', ' New item (item) ', '(Household)')
> However ignoring the difference in whitespace, the OP's intention is
> clear. Yours returns
>     ('Update:', ' New item ', '(item) (Household)')
> 
The OP said it works OK, which I took to mean that the OP was OK with
the extra whitespace, which can be easily stripped off. Close enough!
> 
>>> It does not work however when there is no text in parentheses (eg
>>> "Update: new item"). How can I get the expression to return a tuple
>>> such as ("Update:", "new item", None)?
>> You need to make the last group optional and also make the middle group
>> lazy: r'^(Update:)?(.*?)(?:(\(.*\)))?$'.
> 
> Why do you perpetuate the redundant ^ anchor?
> 
The OP didn't say whether search() or match() was being used. With the ^
it doesn't matter.

>> (?:...) is the non-capturing version of (...).
> 
> Why do you use
>     (?:(subpattern))?
> instead of just plain
>     (subpattern)?
> ?
> 
Oops, you're right. I was distracted by the \( and \)! :-)



More information about the Python-list mailing list