Help with script with performance problems

Dennis Roberts googlegroups at
Sun Nov 23 08:35:56 CET 2003

I have a script to parse a dns querylog and generate some statistics. 
For a 750MB file a perl script using the same methods (splits) can
parse the file in 3 minutes.  My python script takes 25 minutes.  It
is enough of a difference that unless I can figure out what I did
wrong or a better way of doing it I might not be able to use python
(since most of what I do is parsing various logs).  The main reason to
try python is I had to look at some early scripts I wrote in perl and
had no idea what the hell I was thinking or what the script even did! 
After some googling and reading Eric Raymonds essay on python I jumped
in:)  Here is my script.  I am looking for constructive comments -
please don't bash my newbie code.

#!/usr/bin/python -u

import string
import sys

clients = {}
queries = {}
count = 0

print "Each dot is 100000 lines..."

f = sys.stdin

while 1:

    line = f.readline()

    if count % 100000 == 0:

    if line:
        splitline = string.split(line)

            (month, day, time, stype, source, qtype, query, ctype,
record) = splitline
            print "problem spliting line", count
            print line

            words = string.split(source,'#')
            source = words[0]
            print "problem splitting source", count
            print line

        if clients.has_key(source):
            clients[source] = clients[source] + 1
            clients[source] = 1

        if queries.has_key(query):
            queries[query] = queries[query] + 1
            queries[query] = 1


    count = count + 1


print count, "lines processed"

for numclient, count in clients.items():
    if count > 100000:
        print "%s,%s" % (numclient, count)

for numquery, count in queries.items():
    if count > 100000:
        print "%s,%s" % (numquery, count)

More information about the Python-list mailing list