Integrating awk in Python

Python Nutter pythonnutter at
Sat Jan 17 02:01:57 CET 2009

If you or anyone who reads the thread is interested in using Python in
an advanced way you use generators and build processing chains that
will take the performance of Python to the edge and even give old AWK
a run for its money for certain types of processing.

wwwlog     = open("access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes      = (int(x) for x in bytecolumn if x != '-')
print "Total", sum(bytes)

Python execution time: 25.96 seconds

% awk '{ total += $NF } END { print total }' big-access-log

AWK execution time: 37.33 seconds

With generators you can plug in filters at any stage:

lines = lines_from_dir("big-access-log",".")
lines = (line for line in lines if 'robots.txt' in line)
log   = apache_log(lines)
addrs = set(r['host'] for r in log)the beauty of generators is that
you can plug
filters in at almost any stage

The second line increased the execution time of a 1.3GB log file.
Without it the execution was shameful at 53 minutes
With the second line added execution time was 93 seconds

David Beazley presented a great talk and accompanying PDF at
PyCon'2008. It would be great if these generator tricks / patterns
came more to the focus of the commuinity.

Link if interested:

2009/1/16 Alfons Nonell-Canals <alfons.nonell at>:
> Hello,
> I'm developing a software package using python. I've programmed all
> necessary tools but I have to use other stuff from other people. Most of
> these external scripts are developed using awk.
> At the beggining I thought to "translate" them and program them in python
> but I prefer to avoid it because it means a lot of work and I should do it
> after each new version of this external stuff. I would like to integrate
> them into my python code.
> I know I can call them using the system environment but it is slower than if
> I call them inside the package. I know it is possible with C, do you have
> experience on integrate awk into python calling these awk scripts from
> python?
> Thanks in advance!
> Regards,
> Alfons.
> --
> ------------
> Alfons Nonell-Canals, PhD
> Chemogenomics Lab
> Research Group on Biomedical Informatics (GRIB) - IMIM/UPF
> Barcelona Biomedical Research Park (PRBB)
> C/ Doctor Aiguader, 88 - 08003 Barcelona
> alfons.nonell at -
> Tel. +34933160528
> --

More information about the Python-list mailing list