[Tutor] advice on regex matching for dates?

Serdar Tumgoren zstumgoren at gmail.com
Thu Dec 11 20:31:39 CET 2008


Hey everyone,

I was wondering if there is a way to use the datetime module to check for
variations on a month name when performing a regex match?

In the script below, I created a regex pattern that checks for dates in the
following pattern:  "August 31, 2007". If there's a match, I can then print
the capture date and the line from which it was extracted.

While it works in this isolated case, it struck me as not very flexible.
What happens when I inevitably get data that has dates formatted in a
different way? Do I have to create some type of library that contains
variations on each month name (i.e. - January, Jan., 01, 1...) and use that
to parse each line?

Or is there some way to use datetime to check for date patterns when using
regex? Is there a "best practice" in this area that I'm unaware of in this
area?

Apologies if this question has been answered elsewhere. I wasn't sure how to
research this topic (beyond standard datetime docs), but I'd be happy to RTM
if someone can point me to some resources.

Any suggestions are welcome (including optimizations of the code below).

Regards,
Serdar

#!/usr/bin/env python

import re, sys

sourcefile = open(sys.argv[1],'r')

pattern =
re.compile(r'(?P<month>January|February|March|April|May|June|July|August|September|October|November|December)\s(?P<day>\d{1,2}),\s(?P<year>\d{4})')

pattern2 = re.compile(r'Return to List')

counter = 0

for line in sourcefile:
    x = pattern.search(line)
    break_point = pattern2.match(line)

    if x:
        counter +=1
        print "%s %d, %d <== %s" % ( x.group('month'), int(x.group('day')),
int(x.group('year')), line ),
    elif break_point:
        break

print counter
sourcefile.close()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20081211/f04ba88b/attachment.htm>


More information about the Tutor mailing list