parse date/time from a log entry with only strftime (and no regexen)
Simon Mullis
simon at mullis.co.uk
Thu Nov 11 19:24:56 EST 2010
This was a long time ago.... But just in case anyone googling ever has
the same question, this is what I did (last year). The user just needs
to supply a strftime formatted string, such as "%A, %e %b %h:%M" and
this Class figures out the regex to use on the log entries...
class RegexBuilder(object):
"""This class is used to create the regex from the strftime string.
So, we pass it a strftime string and it returns a regex with capture
groups."""
lookup_table = { '%a' : r"(\w{3})", # locale's abbrev day name
'%A' : r"(\w{6,8})", # locale's full day name
'%b' : r"(\w{3})", # abbrev month name
'%B' : r"(\w{4,9})", # full month name
'%d' : r"(3[0-1]|[1-2]\d|0[1-9]|[1-9]|[1-9])",
# day of month
'%e' : r"([1-9]|[1-3][0-9])", # day of month, no leader
'%H' : r"(2[0-3]|[0-1]\d|\d)", # Hour (24h clock)
'%I' : r"(1[0-2]|0[1-9]|[1-9])", # Hour (12h clock)
'%j' : r"(36[0-6]|3[0-5]\d|[1-2]\d\d|0[1-9]\d|00[1-9]\
|[1-9]\d|0[1-9]|[1-9])", # Day of year
'%m' : r"(1[0-2]|0[1-9]|[1-9])", # Month as decimal
'%M' : r"([0-5]\d|\d)", # Minute
'%S' : r"(6[0-1]|[0-5]\d|\d)", # Second
'%U' : r"(5[0-3]|[0-4]\d|\d)", # Week of year (Sun = 0)
'%w' : r"([0-6])", # Weekday (Sun = 0)
'%W' : r"(5[0-3]|[0-5]\d|\d)", # Week of year (Mon = 0)
'%y' : r"(\d{2})", # Year (no century)
'%Y' : r"(\d{4})", # Year with 4 digits
'%p' : r"(AM|PM)",
'%P' : r"(am|pm)",
'%f' : r"(\d+)", # TODO: microseconds. Only in Py 2.6+
}
# Format of the keys in the table above
strftime_re = r'%\w'
def __init__(self, date_format):
r = re.compile(RegexBuilder.strftime_re)
self.created_re = r.sub(self._lookup, date_format)
def _lookup(self, match):
""" Regex lookup..."""
return RegexBuilder.lookup_table[match.group()]
> 2009/2/3 andrew cooke <andrew at acooke.org>
>>
>> > > ValueError: unconverted data remains: this is the remainder of the
>> > > log
>> > > line
>> > > that I do not care about
>>
>> you could catch the ValueError and split at the ':' in the .args
>> attribute to find the extra data. you could then find the extra data
>> in the original string, use the index to remove it, and re-parse the
>> time.
>>
>> ugly, but should work.
>> andrew
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
>
>
> --
> Simon Mullis
> _________________
> simon at mullis.co.uk
>
More information about the Python-list
mailing list