[Tutor] really basic py/regex

Steven D'Aprano steve at pearwood.info
Sat Mar 31 03:09:06 EDT 2018


On Fri, Mar 30, 2018 at 02:00:13PM -0400, bruce wrote:
> Hi.
> 
> Trying to quickly get the re.match(....)  to extract the groups from the string.
> 
> x="MATH 59900/40 [47490] - THE "
> 
> The regex has to return MATH, 59900, 40,, and 47490

Does it have to be a single regex? The simplest way is to split the 
above into words, apply a regex to each word separately, and filter out 
anything you don't want with a blacklist:

import re
regex = re.compile(r'\w+')  # one or more alphanumeric characters

string = "MATH 59900/40 [47490] - THE "
blacklist = set(['THE'])  # in Python 3, use {'THE'}

words = string.split()
results = []
for word in words:
    results.extend(regex.findall(word))

results = [word for word in results if word not in blacklist]
print(results)


Here's an alternative solution:

# version 2
words = string.split()
results = []
for word in words:
    for w in regex.findall(word):
        if w not in blacklist:
            results.append(w)

print(results)



-- 
Steve


More information about the Tutor mailing list