Implementing file reading in C/Python
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Fri Jan 9 04:15:20 EST 2009
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
> I've first tried Python. Please don't beat me, it's slow as hell and
> probably a horrible solution:
>
> #!/usr/bin/python
> import sys
> import os
>
> f = open(sys.argv[1], "r")
Mode should be 'rb'.
> filesize = os.stat(sys.argv[1])[6]
`os.path.getsize()` is a little bit more readable.
> width = 1024
> height = 1024
> pixels = width * height
> blocksize = filesize / width / height
>
> print("Filesize : %d" % (filesize)) print("Image size : %dx%d"
> % (width, height)) print("Bytes per Pixel: %d" % (blocksize))
Why parentheses around ``print``\s "argument"? In Python <3 ``print`` is
a statement and not a function.
> picture = { }
> havepixels = 0
> while True:
> data = f.read(blocksize)
> if len(data) <= 0: break
if data:
break
is enough.
> datamap = { }
> for i in range(len(data)):
> datamap[ord(data[i])] = datamap.get(data[i], 0) + 1
Here you are creating a list full of integers to use them as index into
`data` (twice) instead of iterating directly over the elements in
`data`. And you are calling `ord()` for *every* byte in the file
although you just need it for one value in each block. If it's possible
to write the raw PGM format this conversion wouldn't be necessary at all.
For the `datamap` a `collections.defaultdict()` might be faster.
> maxchr = None
> maxcnt = None
> for (char, count) in datamap.items():
> if (maxcnt is None) or (count > maxcnt):
> maxcnt = count
> maxchr = char
Untested:
maxchr = max((i, c) for c, i in datamap.iteritems())[1]
> most = maxchr
Why?
> posx = havepixels % width
> posy = havepixels / width
posx, posy = divmod(havepixels, width)
Don't know if this is faster though.
> havepixels += 1
> if (havepixels % 1024) == 0:
> print("Progresss %s: %.1f%%" % (sys.argv[1], 100.0 *
havepixels /
> pixels))
>
> picture[(posx, posy)] = most
Why are you using a dictionary as "2d array"? In the C code you simply
write the values sequentially, why can't you just use a flat list and
append here?
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list