Implementing file reading in C/Python
Johannes Bauer
dfnsonfsduifb at gmx.de
Fri Jan 9 08:38:20 EST 2009
Marc 'BlackJack' Rintsch schrieb:
>> f = open(sys.argv[1], "r")
>
> Mode should be 'rb'.
Check.
>> filesize = os.stat(sys.argv[1])[6]
>
> `os.path.getsize()` is a little bit more readable.
Check.
>> print("Filesize : %d" % (filesize)) print("Image size : %dx%d"
>> % (width, height)) print("Bytes per Pixel: %d" % (blocksize))
>
> Why parentheses around ``print``\s "argument"? In Python <3 ``print`` is
> a statement and not a function.
I write all new code to work under Python3.0. Actually I develop on
Python 3.0 but the code is currently deployed onto 2.6.
>> picture = { }
>> havepixels = 0
>> while True:
>> data = f.read(blocksize)
>> if len(data) <= 0: break
>
> if data:
> break
>
> is enough.
>
>> datamap = { }
>> for i in range(len(data)):
>> datamap[ord(data[i])] = datamap.get(data[i], 0) + 1
>
> Here you are creating a list full of integers to use them as index into
> `data` (twice) instead of iterating directly over the elements in
> `data`. And you are calling `ord()` for *every* byte in the file
> although you just need it for one value in each block. If it's possible
> to write the raw PGM format this conversion wouldn't be necessary at all.
OK, those two are just stupid, you're right. I changed it to:
datamap = { }
for i in data:
datamap[i] = datamap.get(i, 0) + 1
array = sorted([(b, a) for (a, b) in datamap.items()], reverse=True)
most = ord(array[0][1])
pic.write("%d\n" % (most))
> For the `datamap` a `collections.defaultdict()` might be faster.
Tried that, not much of a change.
>> maxchr = None
>> maxcnt = None
>> for (char, count) in datamap.items():
>> if (maxcnt is None) or (count > maxcnt):
>> maxcnt = count
>> maxchr = char
>
> Untested:
>
> maxchr = max((i, c) for c, i in datamap.iteritems())[1]
This is nice, I use it - the sort thing was a workaround anyways.
>> most = maxchr
>
> Why?
I don't really know anymore :-\
>> posx = havepixels % width
>> posy = havepixels / width
>
> posx, posy = divmod(havepixels, width)
That's a nice one.
> Why are you using a dictionary as "2d array"? In the C code you simply
> write the values sequentially, why can't you just use a flat list and
> append here?
Yup, I changed the Python code to behave the same way the C code did -
however overall it's not much of an improvement: Takes about 15 minutes
to execute (still factor 23).
Thanks for all your pointers!
Kind regards,
Johannes
--
"Meine Gegenklage gegen dich lautet dann auf bewusste Verlogenheit,
verlästerung von Gott, Bibel und mir und bewusster Blasphemie."
-- Prophet und Visionär Hans Joss aka HJP in de.sci.physik
<48d8bf1d$0$7510$5402220f at news.sunrise.ch>
More information about the Python-list
mailing list