How to locate the bit in bits string?

Tim Chase python.list at tim.thechases.com
Wed Apr 29 03:34:07 CEST 2009

```>> You omit some key details -- namely how do you know that
>> "1001" is 4 bits and not "00001001" (8-bits)?  If it's a
>> string (as your current code shows), you can determine the
>> length.  However, if they are actually ints, your code
>> should work fine & be O(1).
>
> Actually, what I have is a list of integer numbers
> [3,55,99,44], and by using Huffman coding or fixed length
> coding, I will know how the bits-length for each number. When
> I try to concatenate them (say 10,000 items in the list) all
> together, the speed is going do

Ah, this makes more sense!

I'd try creating a bitwriter object that takes fragments of bytes
and writes them sequentially to a file-like object.  Something
like this untested class:

class BitWriter:
def __init__(self, outstream):
self.out = outstream
self.datum = 0  # the data accrued
self.bits = 0   # the number of bits we've accrued
def write(self, byte_data, bit_count):
# maybe add some code to ensure
# that byte_data doesn't have any
# significant bits beyond bit_count
datum = (self.datum << bit_count) | byte_data
self.bits += bit_count
if self.bits >= 8:
overflow = self.bits - 8
self.out.write(chr(datum >> overflow))
datum -= (datum >> overflow) << overflow
self.bits -= 8
def close(self):
if self.bits: # we have unflushed data
data = self.datum << (8 - self.bits)
self.out.write(chr(data))
self.out.close()

out = file('compressed.dat', 'wb')
bw = BitWriter(out)
for data, bit_count in source():
out.write(data, bit_count)
out.close()

As mentioned, it's 100% untested, I don't know what sort of
performance it has, nor the endian'ness of your data streams, so
you might have to twiddle bits in the opposite directions and
test the performance.

-tkc

```