Generator slower than iterator?

Gary Herron gherron at
Tue Dec 16 17:30:00 CET 2008

Lie Ryan wrote:
> On Tue, 16 Dec 2008 12:07:14 -0300, Federico Moreira wrote:
>> Hi all,
>> Im parsing a 4.1GB apache log to have stats about how many times an ip
>> request something from the server.
>> The first design of the algorithm was
>> for line in fileinput.input(sys.argv[1:]):
>>     ip = line.split()[0]
>>     if match_counter.has_key(ip):
>>         match_counter[ip] += 1
>>     else:
>>         match_counter[ip] = 1
>> And it took 3min 58 seg to give me the stats
>> Then i tried a generator solution like
>> def generateit():
>>     for line in fileinput.input(sys.argv[1:]):
>>         yield line.split()[0]
>> for ip in generateit():
>>     ...the same if sentence
>> Instead of being faster it took 4 min 20 seg
>> Should i leave fileinput behind?
>> Am i using generators with the wrong aproach?
> What's fileinput? A file-like object (unlikely)? Also, what's 
> fileinput.input? I guess the reason why you don't see much difference 
> (and is in fact slower) lies in what fileinput.input does.

Fileinput is a standard module distributed with Python:

>From the manual:

11.2 fileinput -- Iterate over lines from multiple input streams

This module implements a helper class and functions to quickly write a
loop over standard input or a list of files.

The typical use is:

import fileinput
for line in fileinput.input():


> Generators excels in processing huge data since it doesn't have to create 
> huge intermediate lists which eats up memory, given an infinite memory, a 
> generator solution is almost always slower than straight up solution 
> using lists. However in real life we don't have infinite memory, hogging 
> our memory with the huge intermediate list would make the system start 
> swapping, swapping is very slow and is a big hit to performance. This is 
> the way generator could be faster than list.
> --

More information about the Python-list mailing list