[Tutor] really basic py/regex
Steven D'Aprano
steve at pearwood.info
Sat Mar 31 03:09:06 EDT 2018
On Fri, Mar 30, 2018 at 02:00:13PM -0400, bruce wrote:
> Hi.
>
> Trying to quickly get the re.match(....) to extract the groups from the string.
>
> x="MATH 59900/40 [47490] - THE "
>
> The regex has to return MATH, 59900, 40,, and 47490
Does it have to be a single regex? The simplest way is to split the
above into words, apply a regex to each word separately, and filter out
anything you don't want with a blacklist:
import re
regex = re.compile(r'\w+') # one or more alphanumeric characters
string = "MATH 59900/40 [47490] - THE "
blacklist = set(['THE']) # in Python 3, use {'THE'}
words = string.split()
results = []
for word in words:
results.extend(regex.findall(word))
results = [word for word in results if word not in blacklist]
print(results)
Here's an alternative solution:
# version 2
words = string.split()
results = []
for word in words:
for w in regex.findall(word):
if w not in blacklist:
results.append(w)
print(results)
--
Steve
More information about the Tutor
mailing list