tips requested for a log-processing script

Jaap jaap at nospaml.com
Sun Nov 5 06:00:07 EST 2006


Python ers,
As a relatively new user of Python I would like to ask your advice on 
the following script I want to create.

I have a logfile which contains records. All records have the same 
layout, and are stored in a CSV-format. Each record is (non-uniquely) 
identified by a date and a itemID. Each itemID can occur 0 or more times 
per month. The item contains a figure/amount which I need to sum per 
month and per itemID. I have already managed to separate the individual 
parts of each logfile-record by using the csv-module from Python 2.5. 
very simple indeed.

Apart from this I have a configuration file, which contains the list of 
itemID's i need to focus on per month. Not all itemID's are relevant for 
each month, but for example only every second or third month. All 
records in the logfile with other itemID's can be ignored. I have yet to 
define the format of this configuration file, but am thinking about a 0 
or 1 for each month, and then the itemID, like:
"1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs 
consideration at first month of each quarter.

My question to this forum is: which data structure would you propose? 
The logfile is not very big (about 200k max, average 200k) so I assume I 
can store in internal memory/list?

How would you propose I tackle the filtering of relevant/non-relevant 
items from logfile? Would you propose I use a filter(func, list) for 
this task or is another thing better?

In the end I want to mail the outcome of my process, but this seems 
straitforward from the documentation I have found, although I must 
connect to an external SMTP-server.

Any tips, views, advice is highly appreciated!


Jaap

PS: when I load the logfile in a spreadsheet I can create a pivot table 
which does about the same ;-] but that is not what I want; the 
processing must be automated in the end with a periodic script which 
e-mails the summary of the keyfigure every month.



More information about the Python-list mailing list