Implementing file reading in C/Python

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Jan 9 22:14:52 CET 2009


On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote:

> Marc 'BlackJack' Rintsch wrote:
>
>> def iter_max_values(blocks, block_count):
>>     for i, block in enumerate(blocks):
>>         histogram = defaultdict(int)
>>         for byte in block:
>>             histogram[byte] += 1
>>         
>>         yield max((count, byte)
>>                   for value, count in histogram.iteritems())[1]
>>         
> [snip]
> Would it be faster if histogram was a list initialised to [0] * 256?

Don't know.  Then for every byte in the 2 GiB we have to call `ord()`.  
Maybe the speedup from the list compensates this, maybe not.

I think that we have to to something with *every* byte of that really 
large file *at Python level* is the main problem here.  In C that's just 
some primitive numbers.  Python has all the object overhead.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list