How to locate the bit in bits string?

Li Wang li.wang.d at gmail.com
Tue Apr 28 23:09:24 EDT 2009


2009/4/29 Tim Chase <python.list at tim.thechases.com>:
>>> You omit some key details -- namely how do you know that
>>> "1001" is 4 bits and not "00001001" (8-bits)?  If it's a
>>> string (as your current code shows), you can determine the
>>> length.  However, if they are actually ints, your code should work fine &
>>> be O(1).
>>
>> Actually, what I have is a list of integer numbers
>> [3,55,99,44], and by using Huffman coding or fixed length
>> coding, I will know how the bits-length for each number. When
>> I try to concatenate them (say 10,000 items in the list) all
>> together, the speed is going do
>
> Ah, this makes more sense!
>
> I'd try creating a bitwriter object that takes fragments of bytes and writes
> them sequentially to a file-like object.  Something like this untested
> class:
>
>  class BitWriter:
>    def __init__(self, outstream):
>      self.out = outstream
>      self.datum = 0  # the data accrued
>      self.bits = 0   # the number of bits we've accrued
>    def write(self, byte_data, bit_count):
>      # maybe add some code to ensure
>      # that byte_data doesn't have any
>      # significant bits beyond bit_count
>      datum = (self.datum << bit_count) | byte_data
>      self.bits += bit_count
>      if self.bits >= 8:
>        overflow = self.bits - 8
>        self.out.write(chr(datum >> overflow))
>        #discard the stuff we wrote
>        datum -= (datum >> overflow) << overflow
>        self.bits -= 8
>    def close(self):
>      if self.bits: # we have unflushed data
>        data = self.datum << (8 - self.bits)
>        self.out.write(chr(data))
>      self.out.close()
>
>  out = file('compressed.dat', 'wb')
>  bw = BitWriter(out)
>  for data, bit_count in source():
>    out.write(data, bit_count)
>  out.close()
>
> As mentioned, it's 100% untested, I don't know what sort of performance it
> has, nor the endian'ness of your data streams, so you might have to twiddle
> bits in the opposite directions and test the performance.

The method looks great, I thought I got your idea: when the
len(bits)>8, we output 8bits and discard them. I believe this will
work efficiently, because I have implemented the algorithm by
manipulating strings ('111001' character string, not bits) in the
similar way, and worked quite fast.:), Thank you very much.

You mentioned "twiddle bits in the opposite directions", that reminds
me of  another possible solutions for this problem. Say, I have bits
'1101' and '110', instead of doing (1101<<3) |110, we could try the
opposite way:
firstly, inverse the '1101' to '1011', and '110' to '011'
then, (011 << 4) | 1011,
if there comes another bit '10100'
inverse it again 10100 to 00101
add it to the long bits: (00101 << 7) | 0111011
so on and so forth.

After all the process is done, we inverse the final long bits

Because in this process you do not need to shift the whole long bits,
but only shift the much shorter bits every time, so the time
efficiency should be not bad.

However, there is one issue I cannot not figure it out: How to inverse
the bits efficiently? do you have any idea?

Thank you very much:!:)
Best regards,
Li

>
> -tkc
>
>
>
>
>
>
>
>



-- 
Li
------
Time is all we have
and you may find one day
you have less than you think



More information about the Python-list mailing list