Reading a Bitstream
Bengt Richter
bokr at oz.net
Wed Nov 19 14:02:47 EST 2003
On Wed, 19 Nov 2003 01:47:26 -0800, Dietrich Epp <dietrich at zdome.net> wrote:
>
>On Nov 18, 2003, at 6:10 PM, Patrick Maupin wrote:
>
>> Dietrich Epp wrote:
>>
>>> Are there any good modules for reading a bitstream? Specifically, I
>>> have a string and I want to be able to get the next N bits as an
>>> integer. Right now I'm using struct.unpack and bit operations, it's a
>>> bit kludgy but it gets the right results.
>>
>> As Miki wrote, the array module will probably give you what
>> you want more easily than struct.unpack. If you need more
>> help, just post a few more details and I will post a code
>> snippet. (As to the rest of Miki's post, I'm not sure that
>> I really want to know what an "Upnacker" is :)
>
>Maybe I should clarify: I need to read bit fields. Neither are they
>aligned to bytes or do they have fixed offsets. In fact, in one part
>of the file there is a list of objects which starts with a 9 bit object
>type followed by fields whose length and number depend on that object
>type, ranging from a dummy 1-bit field to a tuple of four fields of
>length 9, 5, 8, and 8 bits.
>
>I looked at the array module and can't find what I'm looking for.
>Here's a bit of typical usage.
>
>def readStuff(bytes):
> bits = BitStream(bytes[2:])
> isSimple = bits.Get(1)
> objType = chr(bits.Get(8))
> objType += chr(bits.Get(8))
> objType += chr(bits.Get(8))
> objType += chr(bits.Get(8))
> count = bits.Get(3)
> bits.Ignore(5)
> if not isSimple:
> objId = bits.Get(32)
> bytes = bytes[2+bits.PartialBytesRead():]
> return bytes, objType
>
>This is basically the gamut of what I want to do. I have a string, and
>create a bit stream object. I read fields from the bit stream, some
>may not be present, then return an object and the string that comes
>after it. The objects are aligned to bytes in this case even though
>their fields aren't.
>
>I can't figure out how to get array to do this. Array does not look at
>all suited to reading a bit stream. struct.unpack *does* work right
>now, with a lot of help, I was wondering if there was an easier way.
>
>
>
Maybe this will do something for you?
Note that this is a response to your post, and not something previously tested,
(in fact not tested beyond what you see ;-) and it will be slow if you have
huge amounts of data to process.
You pass a string to the constructor, specifying big-endian if not little-endian,
and then you use the read method to read bit fields, which may optionally have
their most significant bits interpreted as sign bits.
E.g., reading 4-bit chunks or bits, little-endian and big-endian:
>>> import sbits
>>> sb = sbits.SBits('01234567')
>>> for i in xrange(8*2): print sb.read(4),
...
0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3
>>> sb = sbits.SBits('01234567',False)
>>> for i in xrange(8*2): print sb.read(4),
...
3 0 3 1 3 2 3 3 3 4 3 5 3 6 3 7
>>> sb = sbits.SBits('\x05')
>>> for i in xrange(8): print sb.read(1),
...
1 0 1 0 0 0 0 0
>>> sb = sbits.SBits('\x05',False)
>>> for i in xrange(8): print sb.read(1),
...
0 0 0 0 0 1 0 1
>>> sb = sbits.SBits('01234567')
>>> hex(sb.read(64))
'0x3736353433323130L'
>>> sb = sbits.SBits('01234567',False)
>>> hex(sb.read(64))
'0x3031323334353637L'
>>> sb = sbits.SBits('01234567')
>>> hex(sb.read(32))
'0x33323130'
>>> hex(sb.read(32))
'0x37363534'
>>> sb = sbits.SBits('01234567',False)
>>> hex(sb.read(32))
'0x30313233'
>>> hex(sb.read(32))
'0x34353637'
Sorry for the lack of doc strings ;-/
Please let me know if/when you find a bug.
====< sbits.py >=========================================
import itertools
class SBits(object):
def __init__(self, s='', little_endian=True):
self.le = little_endian
self.buf = 0L
self.bufbits=0
self.getbyte = itertools.imap(ord, s).next
def read(self, nb=0, signed=False):
try:
while self.bufbits<nb:
if self.le:
self.buf |= (long(self.getbyte())<<self.bufbits) # put at top
else:
self.buf = (self.buf<<8) | self.getbyte()
self.bufbits+=8
except StopIteration: # no more getbyte data
raise EOFError, 'Failed to read %s bits from available %s.'%(nb, self.bufbits)
self.bufbits -= nb
if self.le:
ret = self.buf & ((1L<<nb)-1)
self.buf >>= nb
else:
ret = self.buf>>self.bufbits
self.buf &= ((1L<<self.bufbits)-1)
if signed:
signbit = 1L<<(nb-1)
if signbit & ret:
ret = ret - signbit -signbit
if -2**31 <= ret < 2**31: return int(ret)
return ret #, nb
def test():
sb = SBits('\x03'*(sum(xrange(37))+7))
bits = [sb.read(wid, wid&1>0) for wid in xrange(37)]
hexis = map(hex,bits)
shouldbe = [
'0x0', '0xffffffff', '0x1', '0x0', '0xc', '0x0', '0x6', '0x18',
'0x30', '0x30', '0x18', '0xfffffe06', '0xc0', '0xc0c', '0x2060', '0x181',
'0x303', '0xffff0303', '0x18181', '0x6060', '0xc0c0c', '0xc0c0', '0x60606', '0x181818',
'0x303030', '0x303030', '0x181818', '0xfe060606', '0xc0c0c0', '0xc0c0c0c', '0x20606060', '0x1818181',
'0x3030303', '-0xFCFCFCFDL', '0x181818181L', '0x60606060', '0xC0C0C0C0CL']
for i,h in enumerate(hexis): print '%12s%s'%(h,'\n'[:i%4==3]),
print '\n-----\nThat was%s what was expected.\n-----'%((' not','')[hexis==shouldbe],)
sb = SBits('\xc0'*(sum(xrange(37))+7), False)
bits = [sb.read(wid, wid&1>0) for wid in xrange(37)]
hexis = map(hex,bits)
shouldbe = [
'0x0', '0xffffffff', '0x2', '0x0', '0x3', '0x0', '0x18', '0xc',
'0xc', '0x18', '0x60', '0x303', '0x30', '0x606', '0x181', '0xffffc0c0',
'0xc0c0', '0xffff8181', '0x20606', '0x3030', '0x30303', '0x6060', '0x181818', '0xc0c0c',
'0xc0c0c', '0x181818', '0x606060', '0x3030303', '0x303030', '0x6060606', '0x1818181', '0xc0c0c0c0',
'0xC0C0C0C0L', '0x81818181', '0x206060606L', '0x30303030', '0x303030303L']
for i,h in enumerate(hexis): print '%12s%s'%(h,'\n'[:i%4==3]),
print '\n-----\nThat was%s what was expected.\n-----'%((' not','')[hexis==shouldbe],)
if __name__ == '__main__':
test()
=========================================================
Regards,
Bengt Richter
More information about the Python-list
mailing list