Convert AWK regex to Python

Mon May 16 10:01:22 EDT 2011

Thanks for the sugestions Peter, I will give them a try

Peter Otten wrote:
> J wrote:
>
> > Hello Peter, Angelico,
> >
> > Ok lets see, My aim is to filter out several fields from a log file and
> > write them to a new log file.  The current log file, as I mentioned
> > previously, has thousands of lines like this:- 2011-05-16 09:46:22,361
> > [Thread-4847133] PDU D <G_CC_SMS_SERVICE_51408_656.O_
> > CC_SMS_SERVICE_51408_656-ServerThread-
> VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX
> > - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004
> > Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) >
> >
> > All the lines in the log file are similar and they all have the same
> > length (same amount of fields).  Most of the fields are separated by
> > spaces except for couple of them which I am processing with AWK (removing
> > "<G_" from the string for example).  So in essence what I want to do is
> > evaluate each line in the log file and break them down into fields which I
> > can call individually and write them to a new log file (for example
> > selecting only fields 1, 2 and 3).
> >
> > I hope this is clearer now
>
> Not much :(
>
> It doesn't really matter whether there are 100, 1000, or a million lines in
> the file; the important information is the structure of the file. You may be
> able to get away with a quick and dirty script consisting of just a few
> regular expressions, e. g.
>
> import re
>
> filename = ...
>
> def get_service(line):
>     return re.compile(r"[(](\w+)").search(line).group(1)
>
> def get_command(line):
>     return re.compile(r"<G_(\w+)").search(line).group(1)
>
> def get_status(line):
>     return re.compile(r"Status:\s+(\d+)").search(line).group(1)
>
> with open(filename) as infile:
>     for line in infile:
>         print get_service(line), get_command(line), get_status(line)
>
> but there is no guarantee that there isn't data in your file that breaks the
> implied assumptions. Also, from the shell hackery it looks like your
> ultimate goal seems to be a kind of frequency table which could be built
> along these lines:
>
> freq = {}
> with open(filename) as infile:
>     for line in infile:
>         service = get_service(line)
>         command = get_command(line)
>         status = get_status(line)
>         key = command, service, status
>         freq[key] = freq.get(key, 0) + 1
>
> for key, occurences in sorted(freq.iteritems()):
>     print "Service: {}, Command: {}, Status: {}, Occurences: {}".format(*key
> + (occurences,))