Conversion of 24bit binary to int

Alex Martelli aleax at aleax.it
Wed Nov 12 08:03:28 EST 2003


Idar wrote:

> Thanks for the example!
> 
> The format is binary with no formating characters to indicate start/end of
> each block (fixed size).
> A file is about 6MB (and about 300 of them again...), so
> 
> Ch1: 1536B (512*3B) - the 3B are big endian (int)
> ..
> Ch6: 1536B (512*3B)
> And then it is repeated till the end (say Y sets of Ch1 (the same for
> Ch2,3,4,5,6)):
> Ch1,Y: 1536B (512*3B)
> ..
> Ch6,Y: 1536B (512*3B)
> 
> And idealy I would like to convert it to this format:
> Ch1: Y*512*4B (normal int with little endian)
> Ch2
> Ch3
> Ch4
> Ch5
> Ch6
> And that is the end :)

So, you don't really need to convert binary to int or anything, just
shuffle bytes around, right?  Your file starts with (e.g.), using a
letter for each arbitrary binary byte:

A B C D E F G H I ...

and you want to output the bytes

C B A 0 F E D 0 I H G 0 ...

I.e, swap 3 bytes, insert a 0 byte for padding, and proceed (for all
Ch1, which is spread out in the original file -- then for all Ch2, and
so on).  Each file fits comfortably in memory (3MB for input, becoming
4MB for output due to the padding).  You can use two instances of
array.array('B'), with .read for input and .write for output (just
remember .read _appends_ to the array, so make a new empty one for
each file you're processing -- the _output_ array you can reuse).

It's LOTS of indexing and single-byte moving, so I doubt the Python
native performance will be great.  Still, once you've implemented and
checked it out you can use psyco or pyrex to optimize it, if needed.

The primitive you need is typically "copy with swapping and padding
a block of 1536 input bytes [starting from index SI] to a block of
2048 output bytes" [starting from index SO -- the 0 bytes in the
output you'll leave untouched after at first preparing the output
array with OA = array.array('B', Y*2048*6*'\0') of course].
That's just (using predefined ranges for speed, no need to remake
them every time):

r512 = xrange(512)

def doblock(SI, SO, IA, OA, r512=r512):
    ii = SI
    io = SO
    for i in r512:
       OA[io:io+3] = IA[ii+2:ii-1:-1]
       ii += 3
       io += 4

so basically it only remains to compute SI and SO appropriately
and loop ditto calling this primitive (or some speeded-up version
thereof) 6*Y times for all the blocks in the various channels.


Alex





More information about the Python-list mailing list