Parsing apache log files

Paul McGuire ptmcg at
Fri Feb 20 10:18:01 CET 2004

"Jim Richardson" <warlock at> wrote in message
news:oqigg1-3nl.ln1 at grendel.myth...
> I am pulling apart some big apache logs (800-1000MB) for some analysis,
> and stuffing it into a MySQL database. Most of it goes ok, despite my
> meager coding abilities. But every so often I run across "borken" bits
> of data, like user agent strings that include "'/\ and such, although
> they are escaped by apache in writing the log, they break up my somewhat
> clunky splits.
pyparsing examples directory includes an HTTP server log parser.  Using your
data, there was one minor error where the bytesSent field in the first line
was just a dash instead of an integer.  After correcting that, I ran it
against your test lines and got this output:

fields.numBytesSent = -
fields.timestamp = ['16/Feb/2004:04:09:49', '-0800']
fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
fields.referrer =
fields.cmd = ['GET', '/ads/redirectads/336x280redirect.htm', 'HTTP/1.1']
fields.ipAddr =
fields.statusCode = 304

fields.numBytesSent = 541
fields.timestamp = ['16/Feb/2004:10:35:12', '-0800']
fields.clientSfw = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera
7.20  [ru
fields.referrer =
fields.cmd = ['GET', '/ads/redirectads/468x60redirect.htm', 'HTTP/1.1']
fields.ipAddr =
fields.statusCode = 200

Download pyparsing at

Here's the change you'll have to make to the example:

                       integer.setResultsName("statusCode") +
                       integer.setResultsName("numBytesSent")  +
                       (integer | "-").setResultsName("statusCode") +
                       (integer | "-").setResultsName("numBytesSent")  +

-- Paul

More information about the Python-list mailing list