[Tutor] parsing sendmail logs

Martin Walsh mwalsh at groktech.org
Tue Jul 15 15:26:16 CEST 2008

Monika Jisswel wrote:
> to say the truth I never  thought about "additional overhead of getting
> the input/output data transferred" because the suprocess itself will
> contain the (bash)pipe to redirect output to the next utility used not
> the python subprocess.PIPE pipe so it will be like one subprocess with
> each utility piping stdout to the next as if run from the shell, what

I agree with Alan. Personally, I find trying to replace shell scripts
with python code, just plain awk-ward ... ahem, please forgive the pun.
:) No doubt the subprocess module is quite handy. But in this case it
would be hard to beat a shell script, for simplicity, with a chain of
subprocess.Popen calls. I realize this is subjective, of course.

> python comes in for ? well, its always sweet to work with python as it
> will allow you to make whatever logic you have in yoru head into real
> life with ease and at the end of the subprocess you can always parse the
> stdout using python this time & load results to some database.

If you follow the unix philosophy(tm) it might make more sense to pipe
the result (of a shell pipeline) to a python script that does only the
database load.

> I have to say that I have seen awk, grep & sort, wc, work on files of
> handreds of Mbytes in a matter of 1 or 2 seconds ... why would I replace
> such a fast tools ?

I can think of a few reasons, not the least of which is the OP's -- as
"a programming exercise".

> Alan do you think python can beat awk in speed when it comes to
> replacing text ?  I always wanted to know it !

Well, maybe. But IMHO, the the question should really be is python 'fast
enough'. Especially when you consider how the OP is using awk in the
first place. But the only way to know, is to try it out.

>     Any pragmatic advice on building or working with a framework to get
>     to the point where i can do analysis on my logs would be cool.

As an exercise, I think it would be a reasonable approach to write
python derivatives of the shell commands being used, perhaps tailored to
the data set, to get a feel for working with text data in python. Then
ask questions here if you get stuck, or need optimization advice. I
think you'll find you can accomplish this with just a few lines of
python code for each (sort -u, grep, awk '{print $n}', etc), given your
use of the commands in the examples provided. Write each as a function,
and you'll end up with code you can reuse for other log analysis
projects. Bonus!


More information about the Tutor mailing list