Trouble with regular expressions

John Machin sjmachin at lexicon.net
Sat Feb 7 09:18:07 EST 2009


On Feb 7, 11:18 pm, LaundroMat <Laun... at gmail.com> wrote:
> Hi,
>
> I'm quite new to regular expressions, and I wonder if anyone here
> could help me out.
>
> I'm looking to split strings that ideally look like this: "Update: New
> item (Household)" into a group.
> This expression works ok: '^(Update:)?(.*)(\(.*\))$' - it returns
> ("Update", "New item", "(Household)")
>
> Some strings will look like this however: "Update: New item (item)
> (Household)". The expression above still does its job, as it returns
> ("Update", "New item (item)", "(Household)").
>
> It does not work however when there is no text in parentheses (eg
> "Update: new item"). How can I get the expression to return a tuple
> such as ("Update:", "new item", None)?

I don't see how it can be done without some post-matching adjustment.
Try this:

C:\junk>type mathieu.py
import re

tests = [
    "Update: New item (Household)",
    "Update: New item (item) (Household)",
    "Update: new item",
    "minimal",
    "parenthesis (plague) (has) (struck)",
    ]

regex = re.compile("""
    (Update:)?      # optional prefix
    \s*             # ignore whitespace
    ([^()]*)        # any non-parentheses stuff
    (\([^()]*\))?   # optional (blahblah)
    \s*            # ignore whitespace
    (\([^()]*\))?   # another optional (blahblah)
    $
    """, re.VERBOSE)

for i, test in enumerate(tests):
    print "Test #%d: %r" % (i, test)
    m = regex.match(test)
    if not m:
        print "No match"
    else:
        g = m.groups()
        print g
        if g[3] is not None:
            x = (g[0], g[1] + g[2], g[3])
        else:
            x = g[:3]
        print x
        print

C:\junk>mathieu.py
Test #0: 'Update: New item (Household)'
('Update:', 'New item ', '(Household)', None)
('Update:', 'New item ', '(Household)')

Test #1: 'Update: New item (item) (Household)'
('Update:', 'New item ', '(item)', '(Household)')
('Update:', 'New item (item)', '(Household)')

Test #2: 'Update: new item'
('Update:', 'new item', None, None)
('Update:', 'new item', None)

Test #3: 'minimal'
(None, 'minimal', None, None)
(None, 'minimal', None)

Test #4: 'parenthesis (plague) (has) (struck)'
No match

HTH,
John



More information about the Python-list mailing list