How to read such file and sumarize the data?

Steve Holden steve at holdenweb.com
Wed Nov 17 18:10:23 EST 2010


On 11/17/2010 4:45 PM, huisky wrote:
> Say I have following log file, which records the code usage.
> I want to read this file and do the summarize how much total CPU time
> consumed for each user.
> Is Python able to do so or say easy to achieve this?, anybody can give
> me some hints, appricate very much!
> 
> 
> Example log file.
> **************************************************************************************
I'm assuming the following (unquoted) data is in file "data.txt":

> LSTC license server version 224 started at Sun Dec  6 18:56:48 2009
> using configuration file /usr/local/lstc/server_data
> xyz 15424 at trofast3.marin.ntnu.no LS-DYNA_971 NCPU=1 started Sun Dec  6
> 18:57:40
> 15424 at trofast3.marin.ntnu.no completed Sun Dec  6 19:42:55
> xyz 15500 at trofast3.marin.ntnu.no LS-DYNA_971 NCPU=2 started Sun Dec  6
> 20:17:02
> 15500 at trofast3.marin.ntnu.no completed Sun Dec  6 20:26:03
> xyz 18291 at trofast2.marin.ntnu.no LS-DYNA_971 NCPU=1 started Sun Dec  6
> 21:01:17
> 18291 at trofast2.marin.ntnu.no completed Sun Dec  6 21:01:28
> tanhoi 552 at iimt-tanhoi-w.ivt.ntnu.no LS-DYNA_971 NCPU=1 started Mon
> Dec  7 09:31:00
> 552 at iimt-tanhoi-w.ivt.ntnu.no presumed dead Mon Dec  7 10:36:48
> sabril 18863 at trofast2.marin.ntnu.no LS-DYNA_971 NCPU=2 started Mon
> Dec  7 13:14:47
> 18863 at trofast2.marin.ntnu.no completed Mon Dec  7 13:24:07
> sabril 18937 at trofast2.marin.ntnu.no LS-DYNA_971 NCPU=2 started Mon
> Dec  7 14:21:34
> sabril 18969 at trofast2.marin.ntnu.no LS-DYNA_971 NCPU=2 started Mon
> Dec  7 14:28:42
> 18969 at trofast2.marin.ntnu.no killed Mon Dec  7 14:31:48
> 18937 at trofast2.marin.ntnu.no killed Mon Dec  7 14:32:06

The line wrapping being wrong shouldn't affect the logic.

$ cat data.py
lines = open("data.txt").readlines()
from collections import defaultdict
c = defaultdict(int)
for line in lines:
    ls = line.split()
    if len(ls) > 3 and ls[3].startswith("NCPU="):
        amt = int(ls[3][5:])
        c[ls[0]] += amt
for key, value in c.items():
    print key, ":", value


$ python data.py
xyz : 4
tanhoi : 1
sabril : 6

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon 2011 Atlanta March 9-17       http://us.pycon.org/
See Python Video!       http://python.mirocommunity.org/
Holden Web LLC                 http://www.holdenweb.com/




More information about the Python-list mailing list