Parsing apache log files
Paul McGuire
ptmcg at users.sourceforge.net
Fri Feb 20 04:18:01 EST 2004
"Jim Richardson" <warlock at eskimo.com> wrote in message
news:oqigg1-3nl.ln1 at grendel.myth...
>
> I am pulling apart some big apache logs (800-1000MB) for some analysis,
> and stuffing it into a MySQL database. Most of it goes ok, despite my
> meager coding abilities. But every so often I run across "borken" bits
> of data, like user agent strings that include "'/\ and such, although
> they are escaped by apache in writing the log, they break up my somewhat
> clunky splits.
>
pyparsing examples directory includes an HTTP server log parser. Using your
data, there was one minor error where the bytesSent field in the first line
was just a dash instead of an integer. After correcting that, I ran it
against your test lines and got this output:
fields.numBytesSent = -
fields.timestamp = ['16/Feb/2004:04:09:49', '-0800']
fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
fields.referrer =
http://www.foobarp.org/theme_detail.php?type=vs&cat=0&mid=27512
fields.cmd = ['GET', '/ads/redirectads/336x280redirect.htm', 'HTTP/1.1']
fields.ipAddr = 111.111.111.11
fields.statusCode = 304
fields.numBytesSent = 541
fields.timestamp = ['16/Feb/2004:10:35:12', '-0800']
fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera
7.20 [ru
fields.referrer = http://11.11.111.11/adframe.php?n=ad1f311a&what=zone:56
fields.cmd = ['GET', '/ads/redirectads/468x60redirect.htm', 'HTTP/1.1']
fields.ipAddr = 11.111.11.111
fields.statusCode = 200
Download pyparsing at http://pyparsing.sourceforge.net.
Here's the change you'll have to make to the example:
Change:
integer.setResultsName("statusCode") +
integer.setResultsName("numBytesSent") +
to:
(integer | "-").setResultsName("statusCode") +
(integer | "-").setResultsName("numBytesSent") +
-- Paul
More information about the Python-list
mailing list