Implementing file reading in C/Python
invalid at invalid
Fri Jan 9 23:23:28 CET 2009
On 2009-01-09, Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
> On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote:
>> Marc 'BlackJack' Rintsch wrote:
>>> def iter_max_values(blocks, block_count):
>>> for i, block in enumerate(blocks):
>>> histogram = defaultdict(int)
>>> for byte in block:
>>> histogram[byte] += 1
>>> yield max((count, byte)
>>> for value, count in histogram.iteritems())
>> Would it be faster if histogram was a list initialised to  * 256?
> Don't know. Then for every byte in the 2??GiB we have to call `ord()`.
> Maybe the speedup from the list compensates this, maybe not.
> I think that we have to to something with *every* byte of that really
> large file *at Python level* is the main problem here. In C that's just
> some primitive numbers. Python has all the object overhead.
Using buffers or arrays of bytes instead of strings/lists would
probably reduce the overhead quite a bit.
Grant Edwards grante Yow! I've got an IDEA!!
at Why don't I STARE at you
visi.com so HARD, you forget your
SOCIAL SECURITY NUMBER!!
More information about the Python-list