Parsing apache log files
Jim Richardson
warlock at eskimo.com
Fri Feb 20 04:54:37 EST 2004
On Thu, 19 Feb 2004 22:32:24 -0800,
Josiah Carlson <jcarlson at nospam.uci.edu> wrote:
>> In the meantime, is there some obvious method, or module that I have
>> missed ?
>
> I use a regular expression:
> import re
> rexp = re.compile('(\d+\.\d+\.\d+\.\d+) - - \[([^\[\]:]+):'
> '(\d+:\d+:\d+) -(\d\d\d\d\)] ("[^"]*") '
> '(\d+) (-|\d+) ("[^"]*") (".*")\s*\Z')
>
> a = rexp.match(line)
> if not a is None:
> a.group(1) #IP address
> a.group(2) #day/month/year
> a.group(3) #time of day
> a.group(4) #timezone
> a.group(5) #request
> a.group(6) #code 200 for success, 404 for not found, etc.
> a.group(7) #bytes transferred
> a.group(8) #referrer
> a.group(9) #browser
> else:
> #this line did not match.
>
> That should work for most any line you get, but you may want to run it
> over a few megs of your logs just to check and see if that else
> statement is ever caught for a non-empty line.
>
> - Josiah
thanks, although reading that re makes my brain hurt! :), and I don't
think it handles the case where the dashes are something else (the dash
is a place holder for some data that wasn't there on this request,
bytelength, referrer, something) but I'll look into it, thanks for the
example.
--
Jim Richardson http://www.eskimo.com/~warlock
Ok, the guy who made the netfilter Makefile was probably on some really
interesting and probably highly illegal drugs when he wrote it.
-- Linus Torvalds
More information about the Python-list
mailing list