[Tutor] regexp: a bit lost
Steven D'Aprano
steve at pearwood.info
Fri Oct 1 07:33:27 CEST 2010
On Fri, 1 Oct 2010 12:45:38 pm Alex Hall wrote:
> Hi, once again...
> I have a regexp that I am trying to use to make sure a line matches
> the format: [c*]n [c*]n n
> where c* is (optionally) 0 or more non-numeric characters and n is
> any numeric character. The spacing should not matter. These should
> pass: v1 v2 5
> 2 someword7 3
>
> while these should not:
> word 2 3
> 1 2
>
> Here is my test:
> s=re.search(r"[\d+\s+\d+\s+\d]", l)
Try this instead:
re.search(r'\d+\s+\D*\d+\s+\d', l)
This searches for:
one or more digits
at least one whitespace char (space, tab, etc)
zero or more non-digits
at least one digit
at least one whitespace
exactly one digit
> However:
> 1. this seems to pass with *any* string, even when l is a single
> character. This causes many problems
[...]
I'm sure it does.
You don't have to convince us that if the regular expression is broken,
the rest of your code has a problem. That's a given. It's enough to
know that the regex doesn't do what you need it to do.
> 3. Once I get the above working, I will need a way of pulling the
> characters out of the string and sticking them somewhere. For
> example, if the string were
> v9 v10 15
> I would want an array:
> n=[9, 10, 15]
Modify the regex to be this:
r'(\d+)\s+\D*(\d+)\s+(\d)'
and then query the groups of the match object that is returned:
>>> mo = re.search(r'(\d+)\s+\D*(\d+)\s+(\d)', 'spam42 eggs23 9')
>>> mo.groups()
('42', '23', '9')
Don't forget that mo will be None if the regex doesn't match, and don't
forget that the items returned are strings.
--
Steven D'Aprano
More information about the Tutor
mailing list