Reading a Bitstream

Bengt Richter bokr at oz.net
Wed Nov 19 14:02:47 EST 2003


On Wed, 19 Nov 2003 01:47:26 -0800, Dietrich Epp <dietrich at zdome.net> wrote:

>
>On Nov 18, 2003, at 6:10 PM, Patrick Maupin wrote:
>
>> Dietrich Epp wrote:
>>
>>> Are there any good modules for reading a bitstream?  Specifically, I
>>> have a string and I want to be able to get the next N bits as an
>>> integer.  Right now I'm using struct.unpack and bit operations, it's a
>>> bit kludgy but it gets the right results.
>>
>> As Miki wrote, the array module will probably give you what
>> you want more easily than struct.unpack.  If you need more
>> help, just post a few more details and I will post a code
>> snippet.  (As to the rest of Miki's post, I'm not sure that
>> I really want to know what an "Upnacker" is :)
>
>Maybe I should clarify: I need to read bit fields.  Neither are they 
>aligned to bytes or do they have fixed offsets.  In fact, in one part 
>of the file there is a list of objects which starts with a 9 bit object 
>type followed by fields whose length and number depend on that object 
>type, ranging from a dummy 1-bit field to a tuple of four fields of 
>length 9, 5, 8, and 8 bits.
>
>I looked at the array module and can't find what I'm looking for.  
>Here's a bit of typical usage.
>
>def readStuff(bytes):
>   bits = BitStream(bytes[2:])
>   isSimple = bits.Get(1)
>   objType = chr(bits.Get(8))
>   objType += chr(bits.Get(8))
>   objType += chr(bits.Get(8))
>   objType += chr(bits.Get(8))
>   count = bits.Get(3)
>   bits.Ignore(5)
>   if not isSimple:
>     objId = bits.Get(32)
>   bytes = bytes[2+bits.PartialBytesRead():]
>   return bytes, objType
>
>This is basically the gamut of what I want to do.  I have a string, and 
>create a bit stream object.  I read fields from the bit stream, some 
>may not be present, then return an object and the string that comes 
>after it.  The objects are aligned to bytes in this case even though 
>their fields aren't.
>
>I can't figure out how to get array to do this.  Array does not look at 
>all suited to reading a bit stream.  struct.unpack *does* work right 
>now, with a lot of help, I was wondering if there was an easier way. 
>  
>
>
Maybe this will do something for you?
Note that this is a response to your post, and not something previously tested,
(in fact not tested beyond what you see ;-) and it will be slow if you have
huge amounts of data to process.

You pass a string to the constructor, specifying big-endian if not little-endian,
and then you use the read method to read bit fields, which may optionally have
their most significant bits interpreted as sign bits.

E.g., reading 4-bit chunks or bits, little-endian and big-endian:

 >>> import sbits
 >>> sb = sbits.SBits('01234567')
 >>> for i in xrange(8*2): print sb.read(4),
 ...
 0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3
 >>> sb = sbits.SBits('01234567',False)
 >>> for i in xrange(8*2): print sb.read(4),
 ...
 3 0 3 1 3 2 3 3 3 4 3 5 3 6 3 7
 >>> sb = sbits.SBits('\x05')
 >>> for i in xrange(8): print sb.read(1),
 ...
 1 0 1 0 0 0 0 0
 >>> sb = sbits.SBits('\x05',False)
 >>> for i in xrange(8): print sb.read(1),
 ...
 0 0 0 0 0 1 0 1

 >>> sb = sbits.SBits('01234567')
 >>> hex(sb.read(64))
 '0x3736353433323130L'
 >>> sb = sbits.SBits('01234567',False)
 >>> hex(sb.read(64))
 '0x3031323334353637L'
 >>> sb = sbits.SBits('01234567')
 >>> hex(sb.read(32))
 '0x33323130'
 >>> hex(sb.read(32))
 '0x37363534'
 >>> sb = sbits.SBits('01234567',False)
 >>> hex(sb.read(32))
 '0x30313233'
 >>> hex(sb.read(32))
 '0x34353637'

Sorry for the lack of doc strings ;-/
Please let me know if/when you find a bug.


====< sbits.py >=========================================
import itertools
class SBits(object):
    def __init__(self, s='', little_endian=True):
        self.le = little_endian
        self.buf = 0L
        self.bufbits=0
        self.getbyte = itertools.imap(ord, s).next
    def read(self, nb=0, signed=False):
        try:
            while self.bufbits<nb:
                if self.le:
                    self.buf |= (long(self.getbyte())<<self.bufbits) # put at top
                else:
                    self.buf = (self.buf<<8) | self.getbyte()
                self.bufbits+=8
        except StopIteration:   # no more getbyte data
            raise EOFError, 'Failed to read %s bits from available %s.'%(nb, self.bufbits)
        self.bufbits -= nb
        if self.le:
            ret = self.buf & ((1L<<nb)-1)
            self.buf >>= nb
        else:
            ret = self.buf>>self.bufbits
            self.buf &= ((1L<<self.bufbits)-1)
        if signed:
            signbit = 1L<<(nb-1)
            if signbit & ret:
                ret = ret - signbit -signbit
        if -2**31 <= ret < 2**31: return int(ret)
        return ret #, nb

def test():
    sb = SBits('\x03'*(sum(xrange(37))+7))
    bits =  [sb.read(wid, wid&1>0) for wid in xrange(37)]
    hexis = map(hex,bits)
    shouldbe = [
    '0x0', '0xffffffff', '0x1', '0x0', '0xc', '0x0', '0x6', '0x18',
    '0x30', '0x30', '0x18', '0xfffffe06', '0xc0', '0xc0c', '0x2060', '0x181',
    '0x303', '0xffff0303', '0x18181', '0x6060', '0xc0c0c', '0xc0c0', '0x60606', '0x181818',
    '0x303030', '0x303030', '0x181818', '0xfe060606', '0xc0c0c0', '0xc0c0c0c', '0x20606060', '0x1818181',
    '0x3030303', '-0xFCFCFCFDL', '0x181818181L', '0x60606060', '0xC0C0C0C0CL']
    for i,h in enumerate(hexis): print '%12s%s'%(h,'\n'[:i%4==3]),
    print '\n-----\nThat was%s what was expected.\n-----'%((' not','')[hexis==shouldbe],)
    
    sb = SBits('\xc0'*(sum(xrange(37))+7), False)
    bits =  [sb.read(wid, wid&1>0) for wid in xrange(37)]
    hexis = map(hex,bits)
    shouldbe = [
    '0x0', '0xffffffff', '0x2', '0x0', '0x3', '0x0', '0x18', '0xc',
    '0xc', '0x18', '0x60', '0x303', '0x30', '0x606', '0x181', '0xffffc0c0',
    '0xc0c0', '0xffff8181', '0x20606', '0x3030', '0x30303', '0x6060', '0x181818', '0xc0c0c',
    '0xc0c0c', '0x181818', '0x606060', '0x3030303', '0x303030', '0x6060606', '0x1818181', '0xc0c0c0c0',
    '0xC0C0C0C0L', '0x81818181', '0x206060606L', '0x30303030', '0x303030303L']
    for i,h in enumerate(hexis): print '%12s%s'%(h,'\n'[:i%4==3]),
    print '\n-----\nThat was%s what was expected.\n-----'%((' not','')[hexis==shouldbe],)
if __name__ == '__main__':
    test()
=========================================================

Regards,
Bengt Richter




More information about the Python-list mailing list