[Tutor] regexp: a bit lost

Fri Oct 1 07:33:27 CEST 2010

On Fri, 1 Oct 2010 12:45:38 pm Alex Hall wrote:
> Hi, once again...
> I have a regexp that I am trying to use to make sure a line matches
> the format: [c*]n [c*]n n
> where c* is (optionally) 0 or more non-numeric characters and n is
> any numeric character. The spacing should not matter. These should
> pass: v1 v2   5
> 2 someword7 3
>
> while these should not:
> word 2  3
> 1 2
>
> Here is my test:
> s=re.search(r"[\d+\s+\d+\s+\d]", l)

Try this instead:

re.search(r'\d+\s+\D*\d+\s+\d', l)

This searches for:
    one or more digits
    at least one whitespace char (space, tab, etc)
    zero or more non-digits
    at least one digit
    at least one whitespace
    exactly one digit

> However:
> 1. this seems to pass with *any* string, even when l is a single
> character. This causes many problems
[...]

I'm sure it does.

You don't have to convince us that if the regular expression is broken, 
the rest of your code has a problem. That's a given. It's enough to 
know that the regex doesn't do what you need it to do.

> 3. Once I get the above working, I will need a way of pulling the
> characters out of the string and sticking them somewhere. For
> example, if the string were
> v9 v10 15
> I would want an array:
> n=[9, 10, 15]

Modify the regex to be this:

r'(\d+)\s+\D*(\d+)\s+(\d)'

and then query the groups of the match object that is returned:

>>> mo = re.search(r'(\d+)\s+\D*(\d+)\s+(\d)', 'spam42   eggs23    9')
>>> mo.groups()
('42', '23', '9')

Don't forget that mo will be None if the regex doesn't match, and don't 
forget that the items returned are strings.

-- 
Steven D'Aprano