regex plea for help
Skip Montanaro
skip at pobox.com
Fri Jun 27 16:00:30 EDT 2003
>> I'm trying to process through an apache log file and bust up the
>> individual sections into a list for further processing. There is a
>> regex I got from a php example that matches an entire line, but
>> obviously, that only returns a single element list.
You perhaps want something like this:
#!/usr/bin/env python
import re
import sys
logpat = re.compile(r"(?P<host>[^ ]+) "
r"(?P<dash>[^ ]+) "
r"(?P<user>[^ ]+) "
r"\[(?P<timestamp>[^]]+)\] "
r'"(?P<method>[^ ]+) '
r"(?P<path>[^ ]+) "
r'(?P<version>[^"]+)" '
r"(?P<response>[0-9]+) "
r"(?P<size>[0-9]+)$")
for line in sys.stdin:
mat = logpat.match(line.strip())
if mat is not None:
print mat.groups()
which when run against my laptop's access_log emits lines like this:
('127.0.0.1', '-', 'skip', '06/Jun/2003:11:41:44 -0500', 'GET', '/nagios/cgi-bin/status.cgi?hostgroup=all', 'HTTP/1.1', '200', '11778')
('127.0.0.1', '-', '-', '06/Jun/2003:11:41:44 -0500', 'GET', '/nagios/stylesheets/status.css', 'HTTP/1.1', '200', '7952')
I can never remember what the second field is. It's always been a dash in
any logfiles I've ever seen.
Skip
More information about the Python-list
mailing list