[Tutor] Testing a string to see if it contains a substring (Steve and Mark)

dw bw_dw at fastmail.fm
Thu Jan 22 17:15:25 CET 2015


Thanks so much Steve and Mark!
You've given me a lot to chew on. :-D
I'll pursue!
More Python FUN!!
================================================================



Based on your description, I think the best way to do this is:

# remove blank lines
line_array = [line for line in line_array if line != '\n']


Possibly this is even nicer:

# get rid of unnecessary leading and trailing whitespace on each line
# and then remove blanks
line_array = [line.strip() for line in line_array]
line_array = [line for line in line_array if line]


This is an alternative, but perhaps a little cryptic for those not 
familiar with functional programming styles:

line_array = filter(None, map(str.strip, line_array))

No regexes required!

However, it isn't clear from your example whether non-blank lines 
*always* include a date. Suppose you have to filter date lines from 
non-date lines?

Start with a regex and a tiny helper function, which we can use lambda 
to embed directly in the call to filter:

DATE = r'\d{2}/\d{2}/\d{4}'
line_array = filter(lambda line: re.search(DATE, line), line_array)

In Python version 3, you may need to wrap that in a call to list:

line_array = list(filter(lambda line: re.search(DATE, line),
line_array))

but that isn't needed in Python 2.

If that's a bit cryptic, here it is again as a list comp:

DATE = r'\d{2}/\d{2}/\d{4}'
line_array = [line for line in line_array if re.search(DATE, line)]


Let's get rid of the whitespace at the same time!

line_array = [line.strip() for line in line_array if 
              re.search(DATE, line)]


And if that's still too cryptic ("what's a list comp?") here it is again 
expanded out in full:


temp = []
for line in line_array:
    if re.search(DATE, line):
        temp.append(line.strip())
line_array = temp


How does this work? It works because the two main re functions, 
re.match and re.search, return None when then regex isn't found, and a 
MatchObject when it is found. None has the property that it is 
considered "false" in a boolean context, while MatchObjects are always 
consider "true".

We don't care *where* the date is found in the string, only whether or 
not it is found, so there is no need to check the starting position.



-- 
Steven

=============================
I'd use 
https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime 
to test the first ten characters of the string.  I'll leave that and 
handling IndexError or ValueError to you :)
-- 
 Bw_dw at fastmail.net



More information about the Tutor mailing list