Convert AWK regex to Python
Peter Otten
__peter__ at web.de
Mon May 16 07:36:01 EDT 2011
J wrote:
> Hello Peter, Angelico,
>
> Ok lets see, My aim is to filter out several fields from a log file and
> write them to a new log file. The current log file, as I mentioned
> previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
> [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_
> CC_SMS_SERVICE_51408_656-ServerThread-
VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
> - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004
> Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >
>
> All the lines in the log file are similar and they all have the same
> length (same amount of fields). Most of the fields are separated by
> spaces except for couple of them which I am processing with AWK (removing
> "<G_" from the string for example). So in essence what I want to do is
> evaluate each line in the log file and break them down into fields which I
> can call individually and write them to a new log file (for example
> selecting only fields 1, 2 and 3).
>
> I hope this is clearer now
Not much :(
It doesn't really matter whether there are 100, 1000, or a million lines in
the file; the important information is the structure of the file. You may be
able to get away with a quick and dirty script consisting of just a few
regular expressions, e. g.
import re
filename = ...
def get_service(line):
return re.compile(r"[(](\w+)").search(line).group(1)
def get_command(line):
return re.compile(r"<G_(\w+)").search(line).group(1)
def get_status(line):
return re.compile(r"Status:\s+(\d+)").search(line).group(1)
with open(filename) as infile:
for line in infile:
print get_service(line), get_command(line), get_status(line)
but there is no guarantee that there isn't data in your file that breaks the
implied assumptions. Also, from the shell hackery it looks like your
ultimate goal seems to be a kind of frequency table which could be built
along these lines:
freq = {}
with open(filename) as infile:
for line in infile:
service = get_service(line)
command = get_command(line)
status = get_status(line)
key = command, service, status
freq[key] = freq.get(key, 0) + 1
for key, occurences in sorted(freq.iteritems()):
print "Service: {}, Command: {}, Status: {}, Occurences: {}".format(*key
+ (occurences,))
More information about the Python-list
mailing list