A better RE?

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri Mar 10 15:04:15 CET 2006


"Magnus Lycka" <lycka at carmen.se> wrote in message
news:duq0cj$7ih$1 at wake.carmen.se...
> I want an re that matches strings like "21MAR06 31APR06 1236",
> where the last part is day numbers (1-7), i.e it can contain
> the numbers 1-7, in order, only one of each, and at least one
> digit. I want it as three groups. I was thinking of
>
> r"(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)"
>
> but that will match even if the third group is empty,
> right? Does anyone have good and not overly complex RE for
> this?
>
> P.S. I know the "now you have two problems reply..."

For the pyparsing-inclined, here are two versions, along with several
examples on how to extract the fields from the returned ParseResults object.
The second version is more rigorous in enforcing the days-of-week rules on
the 3rd field.

Note that the month field is already limited to valid month abbreviations,
and the same technique used to validate the days-of-week field could be used
to ensure that the date fields are valid dates (no 31st of FEB, etc.), that
the second date is after the first, etc.

-- Paul
Download pyparsing at http://pyparsing.sourceforge.net.


data  = "21MAR06 31APR06 1236"
data2 = "21MAR06 31APR06 1362"

from pyparsing import *

# define format of an entry
month = oneOf("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC")
date = Combine( Word(nums,exact=2) + month + Word(nums,exact=2) )
daysOfWeek = Word("1234567")
entry = date.setResultsName("startDate") + \
        date.setResultsName("endDate") + \
        daysOfWeek.setResultsName("weekDays") + \
        lineEnd

# extract entry data
e = entry.parseString(data)

# various ways to access the results
print e.startDate, e.endDate, e.weekDays
print "%(startDate)s : %(endDate)s : %(weekDays)s" % e
print e.asList()
print e
print

# get more rigorous in testing for valid days of week field
def rigorousDayOfWeekTest(s,l,toks):
    # remove duplicates from toks[0], sort, then compare to original
    tmp = "".join(sorted(dict([(ll,0) for ll in toks[0]]).keys()))
    if tmp != toks[0]:
        raise ParseException(s,l,"Invalid days of week field")

daysOfWeek.setParseAction(rigorousDayOfWeekTest)
entry = date.setResultsName("startDate") + \
        date.setResultsName("endDate") + \
        daysOfWeek.setResultsName("weekDays") + \
        lineEnd

print entry.parseString(data)
print entry.parseString(data2)         # <-- raises ParseException





More information about the Python-list mailing list