parse date/time from a log entry with only strftime (and no regexen)

Simon Mullis simon at mullis.co.uk
Thu Nov 11 19:24:56 EST 2010


This was a long time ago.... But just in case anyone googling ever has
the same question, this is what I did (last year). The user just needs
to supply a strftime formatted string, such as "%A, %e %b %h:%M" and
this Class figures out the regex to use on the log entries...

class RegexBuilder(object):
    """This class is used to create the regex from the strftime string.
       So, we pass it a strftime string and it returns a regex with capture
       groups."""

    lookup_table = {  '%a' : r"(\w{3})",    # locale's abbrev day name
                      '%A' : r"(\w{6,8})",  # locale's full day name
                      '%b' : r"(\w{3})",    # abbrev month name
                      '%B' : r"(\w{4,9})",  # full month name
                      '%d' : r"(3[0-1]|[1-2]\d|0[1-9]|[1-9]|[1-9])",
                                                            # day of month
                      '%e' : r"([1-9]|[1-3][0-9])", # day of month, no leader
                      '%H' : r"(2[0-3]|[0-1]\d|\d)",   # Hour (24h clock)
                      '%I' : r"(1[0-2]|0[1-9]|[1-9])", # Hour (12h clock)
                      '%j' : r"(36[0-6]|3[0-5]\d|[1-2]\d\d|0[1-9]\d|00[1-9]\
                                    |[1-9]\d|0[1-9]|[1-9])", # Day of year
                      '%m' : r"(1[0-2]|0[1-9]|[1-9])", # Month as decimal
                      '%M' : r"([0-5]\d|\d)",  # Minute
                      '%S' : r"(6[0-1]|[0-5]\d|\d)", # Second
                      '%U' : r"(5[0-3]|[0-4]\d|\d)", # Week of year (Sun = 0)
                      '%w' : r"([0-6])",             # Weekday (Sun = 0)
                      '%W' : r"(5[0-3]|[0-5]\d|\d)", # Week of year (Mon = 0)
                      '%y' : r"(\d{2})", # Year (no century)
                      '%Y' : r"(\d{4})", # Year with 4 digits
                      '%p' : r"(AM|PM)",
                      '%P' : r"(am|pm)",
                      '%f' : r"(\d+)", # TODO: microseconds. Only in Py 2.6+
                      }

    # Format of the keys in the table above
    strftime_re = r'%\w'

    def __init__(self, date_format):
        r = re.compile(RegexBuilder.strftime_re)
        self.created_re = r.sub(self._lookup, date_format)

    def _lookup(self, match):
        """ Regex lookup..."""
        return RegexBuilder.lookup_table[match.group()]


> 2009/2/3 andrew cooke <andrew at acooke.org>
>>
>> > > ValueError: unconverted data remains:  this is the remainder of the
>> > > log
>> > > line
>> > > that I do not care about
>>
>> you could catch the ValueError and split at the ':' in the .args
>> attribute to find the extra data.  you could then find the extra data
>> in the original string, use the index to remove it, and re-parse the
>> time.
>>
>> ugly, but should work.
>> andrew
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
>
>
> --
> Simon Mullis
> _________________
> simon at mullis.co.uk
>



More information about the Python-list mailing list