Implementing file reading in C/Python
rhamph at gmail.com
Sat Jan 10 08:44:31 CET 2009
On Jan 9, 2:14 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote:
> > Marc 'BlackJack' Rintsch wrote:
> >> def iter_max_values(blocks, block_count):
> >> for i, block in enumerate(blocks):
> >> histogram = defaultdict(int)
> >> for byte in block:
> >> histogram[byte] += 1
> >> yield max((count, byte)
> >> for value, count in histogram.iteritems())
> > [snip]
> > Would it be faster if histogram was a list initialised to  * 256?
> Don't know. Then for every byte in the 2 GiB we have to call `ord()`.
> Maybe the speedup from the list compensates this, maybe not.
> I think that we have to to something with *every* byte of that really
> large file *at Python level* is the main problem here. In C that's just
> some primitive numbers. Python has all the object overhead.
struct's B format might help here. Also, struct.unpack_from could
probably be combined with mmap to avoid copying the input. Not to
mention that the 0..256 ints are all saved and won't be allocated/
More information about the Python-list