Regular Expressions...

Ben Finney bignose+hates-spam at
Thu Jan 8 02:41:14 CET 2009

"Ken D'Ambrosio" <ken at> writes:

> Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
> the re module.  For example, I'd like to do a few things (I'm going to use
> phone numbers, 'cause that's what I'm currently dealing with):
> 12345678900 -- How would I:
> - Get just the area code?
> - Get just the seven-digit number?
> In Perl, I'd so something like
> m/^1(...)(.......)/;

Wouldn't that be better as:


I'll assume that more-precise expression in what follows.

> and then I'd have the numbers in $1 and $2, respectively.  But the Python
> stuff simply isn't clicking for me.

In general, where a set of data is likely to be iterated, the Pythonic
way to present it is via a single iterable (instead of, in your Perl
example, separate variables).

Then, for those (generally less frequent) cases where you do want the
separate items, you can bind them in a single statement:

    (foo, bar, baz) = some_sequence


    (foo, bar, baz) = (item for item in some_sequence)


    >>> (foo, bar, baz) = [1, 2, 3]
    >>> foo
    >>> bar
    >>> baz

So, the match returned by the various ‘re’ module match functions is
an object which allows access to the grouped matches as a sequence.

> If anyone could supply concrete examples of how to do the problem,
> above, that would be terrific.

Assuming the following:

    >>> import re
    >>> phone_number_regex = '^1(\d{3})(\d{7})$'

Trivial one-shot example:

    >>> phone_number = '12345678900'
    >>> (area_code, local_number) = re.match(phone_number_regex, phone_number).groups()
    >>> area_code
    >>> local_number

More explicit example, showing the various steps and assuming you want
to re-use the various values in multiple statements:

    >>> phone_number_pattern = re.compile(phone_number_regex)
    >>> phone_number_pattern
    <_sre.SRE_Pattern object at 0xf7f8c598>

    >>> phone_number = '12345678900'
    >>> phone_number_match = phone_number_pattern.match(phone_number)
    >>> phone_number_match
    <_sre.SRE_Match object at 0xf7f52338>

    >>> (area_code, local_number) = phone_number_match.groups()
    >>> area_code
    >>> local_number

Python regular expressions also allow naming each group, for later
access to the matches via a dict:

    >>> phone_number_regex = '^1(?P<area_code>\d{3})(?P<local_number>\d{7})'
    >>> phone_number_pattern = re.compile(phone_number_regex)
    >>> phone_number_match = phone_number_pattern.match(phone_number)
    >>> phone_number_groups = phone_number_match.groupdict()
    >>> phone_number_groups['area_code']
    >>> phone_number_groups['local_number']

 \       “… one of the main causes of the fall of the Roman Empire was |
  `\        that, lacking zero, they had no way to indicate successful |
_o__)                  termination of their C programs.” —Robert Firth |
Ben Finney

More information about the Python-list mailing list