[Tutor] regexp: a bit lost

Fri Oct 1 07:20:35 CEST 2010

Alex Hall wrote:
> Hi, once again...
> I have a regexp that I am trying to use to make sure a line matches the format:
> [c*]n [c*]n n
> where c* is (optionally) 0 or more non-numeric characters and n is any
> numeric character. The spacing should not matter. These should pass:
> v1 v2   5
> 2 someword7 3
> 
> while these should not:
> word 2  3
> 1 2
> 
> Here is my test:
> s=re.search(r"[\d+\s+\d+\s+\d]", l)
> if s: #do stuff
> 
> However:
> 1. this seems to pass with *any* string, even when l is a single
> character. This causes many problems and cannot happen since I have to
[...]

You want to match a whole line, so you should use re.match not 
re.search. See the docs:

http://docs.python.org/library/re.html#matching-vs-searching

You can also use re.split in this case:

yes = """
v1 v2   5
2 someword7 3
""".splitlines()
yes = [line for line in yes if line.strip()]

import re

pattern = "(\w*\d\s+?)" # there may be a better pattern than this
rx = re.compile(pattern)

for line in yes:
     print [part for part in rx.split(line) if part]