Implementing file reading in C/Python

Johannes Bauer dfnsonfsduifb at gmx.de
Fri Jan 9 08:38:20 EST 2009


Marc 'BlackJack' Rintsch schrieb:

>> f = open(sys.argv[1], "r")
> 
> Mode should be 'rb'.

Check.

>> filesize = os.stat(sys.argv[1])[6]
> 
> `os.path.getsize()` is a little bit more readable.

Check.

>> print("Filesize       : %d" % (filesize)) print("Image size     : %dx%d"
>> % (width, height)) print("Bytes per Pixel: %d" % (blocksize))
> 
> Why parentheses around ``print``\s "argument"?  In Python <3 ``print`` is 
> a statement and not a function.

I write all new code to work under Python3.0. Actually I develop on
Python 3.0 but the code is currently deployed onto 2.6.

>> picture = { }
>> havepixels = 0
>> while True:
>> 	data = f.read(blocksize)
>> 	if len(data) <= 0: break
> 
>     if data:
>         break
> 
> is enough.
> 
>> 	datamap = { }
>> 	for i in range(len(data)):
>> 		datamap[ord(data[i])] = datamap.get(data[i], 0) + 1
> 
> Here you are creating a list full of integers to use them as index into 
> `data` (twice) instead of iterating directly over the elements in 
> `data`.  And you are calling `ord()` for *every* byte in the file 
> although you just need it for one value in each block.  If it's possible 
> to write the raw PGM format this conversion wouldn't be necessary at all.

OK, those two are just stupid, you're right. I changed it to:

    datamap = { }
    for i in data:
        datamap[i] = datamap.get(i, 0) + 1

    array = sorted([(b, a) for (a, b) in datamap.items()], reverse=True)
    most = ord(array[0][1])
    pic.write("%d\n" % (most))


> For the `datamap` a `collections.defaultdict()` might be faster.

Tried that, not much of a change.

>> 	maxchr = None
>> 	maxcnt = None
>> 	for (char, count) in datamap.items():
>> 		if (maxcnt is None) or (count > maxcnt):
>> 			maxcnt = count
>> 			maxchr = char
> 
> Untested:
> 
>     maxchr = max((i, c) for c, i in datamap.iteritems())[1]

This is nice, I use it - the sort thing was a workaround anyways.

>> 	most = maxchr
> 
> Why?

I don't really know anymore :-\

>> 	posx = havepixels % width
>> 	posy = havepixels / width
> 
>     posx, posy = divmod(havepixels, width)

That's a nice one.

> Why are you using a dictionary as "2d array"?  In the C code you simply 
> write the values sequentially, why can't you just use a flat list and 
> append here?

Yup, I changed the Python code to behave the same way the C code did -
however overall it's not much of an improvement: Takes about 15 minutes
to execute (still factor 23).

Thanks for all your pointers!

Kind regards,
Johannes

-- 
"Meine Gegenklage gegen dich lautet dann auf bewusste Verlogenheit,
verlästerung von Gott, Bibel und mir und bewusster Blasphemie."
         -- Prophet und Visionär Hans Joss aka HJP in de.sci.physik
                         <48d8bf1d$0$7510$5402220f at news.sunrise.ch>



More information about the Python-list mailing list