[Tutor] Testing a string to see if it contains a substring (Steve and Mark)

dw bw_dw at fastmail.fm
Thu Jan 22 17:15:25 CET 2015

Thanks so much Steve and Mark!
You've given me a lot to chew on. :-D
I'll pursue!
More Python FUN!!

Based on your description, I think the best way to do this is:

# remove blank lines
line_array = [line for line in line_array if line != '\n']

Possibly this is even nicer:

# get rid of unnecessary leading and trailing whitespace on each line
# and then remove blanks
line_array = [line.strip() for line in line_array]
line_array = [line for line in line_array if line]

This is an alternative, but perhaps a little cryptic for those not 
familiar with functional programming styles:

line_array = filter(None, map(str.strip, line_array))

No regexes required!

However, it isn't clear from your example whether non-blank lines 
*always* include a date. Suppose you have to filter date lines from 
non-date lines?

Start with a regex and a tiny helper function, which we can use lambda 
to embed directly in the call to filter:

DATE = r'\d{2}/\d{2}/\d{4}'
line_array = filter(lambda line: re.search(DATE, line), line_array)

In Python version 3, you may need to wrap that in a call to list:

line_array = list(filter(lambda line: re.search(DATE, line),

but that isn't needed in Python 2.

If that's a bit cryptic, here it is again as a list comp:

DATE = r'\d{2}/\d{2}/\d{4}'
line_array = [line for line in line_array if re.search(DATE, line)]

Let's get rid of the whitespace at the same time!

line_array = [line.strip() for line in line_array if 
              re.search(DATE, line)]

And if that's still too cryptic ("what's a list comp?") here it is again 
expanded out in full:

temp = []
for line in line_array:
    if re.search(DATE, line):
line_array = temp

How does this work? It works because the two main re functions, 
re.match and re.search, return None when then regex isn't found, and a 
MatchObject when it is found. None has the property that it is 
considered "false" in a boolean context, while MatchObjects are always 
consider "true".

We don't care *where* the date is found in the string, only whether or 
not it is found, so there is no need to check the starting position.


I'd use 
to test the first ten characters of the string.  I'll leave that and 
handling IndexError or ValueError to you :)
