How to locate the bit in bits string?

Tim Chase python.list at
Wed Apr 29 03:34:07 CEST 2009

>> You omit some key details -- namely how do you know that
>> "1001" is 4 bits and not "00001001" (8-bits)?  If it's a
>> string (as your current code shows), you can determine the
>> length.  However, if they are actually ints, your code 
>> should work fine & be O(1).
> Actually, what I have is a list of integer numbers
> [3,55,99,44], and by using Huffman coding or fixed length
> coding, I will know how the bits-length for each number. When
> I try to concatenate them (say 10,000 items in the list) all
> together, the speed is going do

Ah, this makes more sense!

I'd try creating a bitwriter object that takes fragments of bytes 
and writes them sequentially to a file-like object.  Something 
like this untested class:

   class BitWriter:
     def __init__(self, outstream):
       self.out = outstream
       self.datum = 0  # the data accrued
       self.bits = 0   # the number of bits we've accrued
     def write(self, byte_data, bit_count):
       # maybe add some code to ensure
       # that byte_data doesn't have any
       # significant bits beyond bit_count
       datum = (self.datum << bit_count) | byte_data
       self.bits += bit_count
       if self.bits >= 8:
         overflow = self.bits - 8
         self.out.write(chr(datum >> overflow))
         #discard the stuff we wrote
         datum -= (datum >> overflow) << overflow
         self.bits -= 8
     def close(self):
       if self.bits: # we have unflushed data
         data = self.datum << (8 - self.bits)

   out = file('compressed.dat', 'wb')
   bw = BitWriter(out)
   for data, bit_count in source():
     out.write(data, bit_count)

As mentioned, it's 100% untested, I don't know what sort of 
performance it has, nor the endian'ness of your data streams, so 
you might have to twiddle bits in the opposite directions and 
test the performance.


More information about the Python-list mailing list