regex question

Joshua Macy amused at webamused.com
Thu Jan 20 22:48:17 EST 2000


Roy Smith wrote:
> 
> This isn't really a python question per-se, but it came up while using
> the re module, so there is a connection :-)
> 
> Let's say I've got a pattern "[0-9]", i.e. a 1-digit number.  I want to
> search a string for that pattern, and only return a match if it's found
> preceeded by either a non-digit or the beginning of the string, and
> likewise followed by either a non-digit or the end of the string.  Is
> there a single compact regex I can write to cover that?
> 
> 

A literal translation of your 1 digit problem might look like the
following (using the \d shorthand for [0-9] and \D for [^0-9]:

import re
myRe = re.compile(r'(\D|^)(\d)(\D|$)')

testList = ["This is the number 1, isn't it?", "2 ought to be found",
"We ought to find 3", "But shouldn't find 12",  "12 should not be
found"]
for test in testList:
    m = myRe.search(test)
    if m:
	print m.group(2)
    else:
	print "Not found"

Which produces:
1
2
3
Not found
Not found

If you know you're looking for exactly 12 digits, you can use \d{12} in
place of \d above, e.g.
print "Test2"
testList2 = ["This is the number 123456789012, isn't it?", "123456789012
ought to be found", "We ought to find 123456789012", "But shouldn't find
12",  "12 should not be found"]
myRe2 = re.compile(r'(^|\D)(\d{12})(\D|$)')
for test in testList2:
    m = myRe2.search(test)
    if m:
	print m.group(2)
    else:
	print "Not found"

which prints:
123456789012
123456789012
123456789012
Not found
Not found

It's helpful to know as much as possible about the actual text that
you're going to run the regex against, since there are often shortcuts
you can take (like the \d \D thing) if you know that you'll never
actually see a-fA-F, or looking for whitespace or word boundaries
instead of the negation of the character set.

Joshua



More information about the Python-list mailing list