Implementing file reading in C/Python

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Jan 9 04:15:20 EST 2009


On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:

> I've first tried Python. Please don't beat me, it's slow as hell and
> probably a horrible solution:
> 
> #!/usr/bin/python
> import sys
> import os
> 
> f = open(sys.argv[1], "r")

Mode should be 'rb'.

> filesize = os.stat(sys.argv[1])[6]

`os.path.getsize()` is a little bit more readable.

> width = 1024
> height = 1024
> pixels = width * height
> blocksize = filesize / width / height
> 
> print("Filesize       : %d" % (filesize)) print("Image size     : %dx%d"
> % (width, height)) print("Bytes per Pixel: %d" % (blocksize))

Why parentheses around ``print``\s "argument"?  In Python <3 ``print`` is 
a statement and not a function.

> picture = { }
> havepixels = 0
> while True:
> 	data = f.read(blocksize)
> 	if len(data) <= 0: break

    if data:
        break

is enough.

> 	datamap = { }
> 	for i in range(len(data)):
> 		datamap[ord(data[i])] = datamap.get(data[i], 0) + 1

Here you are creating a list full of integers to use them as index into 
`data` (twice) instead of iterating directly over the elements in 
`data`.  And you are calling `ord()` for *every* byte in the file 
although you just need it for one value in each block.  If it's possible 
to write the raw PGM format this conversion wouldn't be necessary at all.

For the `datamap` a `collections.defaultdict()` might be faster.

> 	maxchr = None
> 	maxcnt = None
> 	for (char, count) in datamap.items():
> 		if (maxcnt is None) or (count > maxcnt):
> 			maxcnt = count
> 			maxchr = char

Untested:

    maxchr = max((i, c) for c, i in datamap.iteritems())[1]

> 	most = maxchr

Why?

> 	posx = havepixels % width
> 	posy = havepixels / width

    posx, posy = divmod(havepixels, width)

Don't know if this is faster though.

> 	havepixels += 1
> 	if (havepixels % 1024) == 0:
> 		print("Progresss %s: %.1f%%" % (sys.argv[1], 100.0 * 
havepixels /
> 		pixels))
> 
> 	picture[(posx, posy)] = most

Why are you using a dictionary as "2d array"?  In the C code you simply 
write the values sequentially, why can't you just use a flat list and 
append here?

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list