Parse ASCII log ; sort and keep most recent entries

Peter Hansen peter at engcorp.com
Wed Jun 16 19:41:58 EDT 2004


Nova's Taylor wrote:

> I am a newbie to Python and am hoping that someone can get me started
> on a log parser that I am trying to write.
> 
> I want to read in and sort the file so the new list only contains only
> the most the most recent PID (PIDS get reused often). In my example,
> the new list would be:
> 
> 1234 williamstim 01AUG03 7:44:31               
> 2348 williamstim 17AUG03 9:13:55               
> 23 jonesjimbo 14OCT03 23:01:23                 
> 748 jonesjimbo 14OCT03 23:59:59  
> 
> So I need to sort by PID and date + time,then keep the most recent. 

I think you are specifying the implementation of the solution
a bit, rather than just the requirements.  Do you really need
the resulting list to be sorted by PID and date/time, or was
that just part of how you thought you'd write it?

If you don't care about the sorting part, but just want the
output to be a list of unique PIDs, you could just do the
following instead, taking advantage of how Python dictionaries
have unique keys.  Note that this assumes that the contents
of the file were originally in order by date (i.e. more recent
items come later).

1. Create empty dict: "d = {}"
2. Read data line by line: "for line in infile.readlines()"
3. Split so the PID is separate: "pid = line.split()[0]"
4. Store entire line in dictionary using PID as key: "d[pid] = line"

When you're done, the dict will contain only the most recent
line with a given PID, though in "arbitrary" (effectively
random) order.  If you don't care about the order of the final
result, just open a file and with one line the reduced data
is written out:

     newfile.write(''.join(d.values()))

-Peter



More information about the Python-list mailing list