Conversion of 24bit binary to int

Wed Nov 12 23:26:38 EST 2003

Idar wrote:

> Thanks for the example!
> 
> The format is binary with no formating characters to indicate start/end of 
> each block (fixed size).
> A file is about 6MB (and about 300 of them again...), so
> 
> Ch1: 1536B (512*3B) - the 3B are big endian (int)
> ..
> Ch6: 1536B (512*3B)
> And then it is repeated till the end (say Y sets of Ch1 (the same for 
> Ch2,3,4,5,6)):
> Ch1,Y: 1536B (512*3B)
> ..
> Ch6,Y: 1536B (512*3B)
> 
> And idealy I would like to convert it to this format:
> Ch1: Y*512*4B (normal int with little endian)
> Ch2
> Ch3
> Ch4
> Ch5
> Ch6
> And that is the end :)
> Idar

OK, now that I have a beer and a specification, here is some code
which (I think) should do what (I think) you are asking for.
On my Athlon 2200+ (marketing number) computer, with the source
file cached by the OS, it operates at around 10 source megabytes/second.

(That should be about 3 minutes plus actual file I/O operations
for the 300 6MB files you describe.)

Verifying that it actually produces the data you expect is up to you :)

Regards,
Pat

import array

def mungeio(srcfile,dstfile, numchannels=6, blocksize=512):
    """
        This function converts 24 bit RGB into 32 bit BGR0,
        and simultaneously de-interleaves video from multiple
        sources.  The parameters are:

            srcfile     -- an file object opened with 'rb'
                           (or similar object)
            dstfile     -- a file object opened with 'wb'
                           (or similar object)
            numchannels -- the number of interleaved video channels
            blocksize   -- the number of pixels per channel on
                           each interleaved block (interleave factor)

        This function reads all the data from srcfile and writes
        it to dstfile.  It is up to the caller to close both files.

        The function asserts that the amount of data to be read
        from the source file is an integral multiple of
        blocksize*numchannels*3.

        This function assumes that multiple copies of the data
        will easily fit into RAM, as the target file size is
        6MB for the source files and 8MB for the destination
        files.  If this is not a good assumption, it should
        be rearchitected to output to one file per channel,
        and then stitch the output files together at the end.
    """

    srcblocksize = blocksize * 3
    dstblocksize = blocksize * 4

    def mungeblock(src,dstarray=array.array('B',dstblocksize*[0])):
        """
            This function accepts a string representing a single
            source block, and returns a string representing a
            single destination block.
        """
        srcarray = array.array('B',src)
        for i in range(3):
            dstarray[2-i::4] = srcarray[i::3]
        return dstarray.tostring()

    channellist = [[] for i in range(numchannels)]

    while 1:
        for channel in channellist:
            data = srcfile.read(srcblocksize)
            if len(data) != srcblocksize:
                break
            channel.append(mungeblock(data))
        else:
            continue # (with while statement)
        break # Propagate break from 'for' out of 'while'

    # Check that input file length is valid (no leftovers),
    # and then write the result.

    assert channel is channellist[0] and not len(data)
    dstfile.write(''.join(sum(channellist,[])))

def mungefile(srcname,dstname):
    """
        Actual I/O done in a separate function so it can
        be more easily unit-tested.
    """
    srcfile = open(srcname,'rb')
    dstfile = open(dstname,'wb')
    mungeio(srcfile,dstfile)
    srcfile.close()
    dstfile.close()