How to locate the bit in bits string?

Tim Chase python.list at tim.thechases.com
Wed Apr 29 03:34:07 CEST 2009


>> You omit some key details -- namely how do you know that
>> "1001" is 4 bits and not "00001001" (8-bits)?  If it's a
>> string (as your current code shows), you can determine the
>> length.  However, if they are actually ints, your code 
>> should work fine & be O(1).
> 
> Actually, what I have is a list of integer numbers
> [3,55,99,44], and by using Huffman coding or fixed length
> coding, I will know how the bits-length for each number. When
> I try to concatenate them (say 10,000 items in the list) all
> together, the speed is going do

Ah, this makes more sense!

I'd try creating a bitwriter object that takes fragments of bytes 
and writes them sequentially to a file-like object.  Something 
like this untested class:

   class BitWriter:
     def __init__(self, outstream):
       self.out = outstream
       self.datum = 0  # the data accrued
       self.bits = 0   # the number of bits we've accrued
     def write(self, byte_data, bit_count):
       # maybe add some code to ensure
       # that byte_data doesn't have any
       # significant bits beyond bit_count
       datum = (self.datum << bit_count) | byte_data
       self.bits += bit_count
       if self.bits >= 8:
         overflow = self.bits - 8
         self.out.write(chr(datum >> overflow))
         #discard the stuff we wrote
         datum -= (datum >> overflow) << overflow
         self.bits -= 8
     def close(self):
       if self.bits: # we have unflushed data
         data = self.datum << (8 - self.bits)
         self.out.write(chr(data))
       self.out.close()

   out = file('compressed.dat', 'wb')
   bw = BitWriter(out)
   for data, bit_count in source():
     out.write(data, bit_count)
   out.close()

As mentioned, it's 100% untested, I don't know what sort of 
performance it has, nor the endian'ness of your data streams, so 
you might have to twiddle bits in the opposite directions and 
test the performance.

-tkc










More information about the Python-list mailing list