tips requested for a log-processing script

Hendrik van Rooyen mail at microcorp.co.za
Mon Nov 6 07:08:25 CET 2006


"Jaap" <jaap at nospaml.com> wrote:


> Python ers,
> As a relatively new user of Python I would like to ask your advice on
> the following script I want to create.
>
> I have a logfile which contains records. All records have the same
> layout, and are stored in a CSV-format. Each record is (non-uniquely)
> identified by a date and a itemID. Each itemID can occur 0 or more times
> per month. The item contains a figure/amount which I need to sum per
> month and per itemID. I have already managed to separate the individual
> parts of each logfile-record by using the csv-module from Python 2.5.
> very simple indeed.
>
> Apart from this I have a configuration file, which contains the list of
> itemID's i need to focus on per month. Not all itemID's are relevant for
> each month, but for example only every second or third month. All
> records in the logfile with other itemID's can be ignored. I have yet to
> define the format of this configuration file, but am thinking about a 0
> or 1 for each month, and then the itemID, like:
> "1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
> consideration at first month of each quarter.
>
> My question to this forum is: which data structure would you propose?
> The logfile is not very big (about 200k max, average 200k) so I assume I
> can store in internal memory/list?
>
> How would you propose I tackle the filtering of relevant/non-relevant
> items from logfile? Would you propose I use a filter(func, list) for
> this task or is another thing better?
>
> In the end I want to mail the outcome of my process, but this seems
> straitforward from the documentation I have found, although I must
> connect to an external SMTP-server.
>
> Any tips, views, advice is highly appreciated!
>
>
> Jaap
>
> PS: when I load the logfile in a spreadsheet I can create a pivot table
> which does about the same ;-] but that is not what I want; the
> processing must be automated in the end with a periodic script which
> e-mails the summary of the keyfigure every month.


I would do something like this: (obviously untested)

for line in readlines(open(logfile,r,1)):
    (code to get hold of item, date, amount)
    if item not in item_dict:
        item_dict[item] = [(date,amount)]
    else:
        item_dict[item].append(date,amount)

this will give you, for each unique item, a direct ref to wherever its been
used.

I would then work through the config file, and extract the items of interest for
the run date...

HTH - Hendrik






More information about the Python-list mailing list