# xor: how come so slow?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Oct 19 09:40:11 CEST 2008

```On Sun, 19 Oct 2008 16:38:37 +1300, Lawrence D'Oliveiro wrote:

> In message <010a9c3f\$0\$20653\$c3e8da3 at news.astraweb.com>, Steven D'Aprano
> wrote:
>
>> On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
>>
>>> Data can come in fractional bits. That's how compression works.
>>
>> If you don't believe me, try compressing a single bit and see if you
>> get a "fractional bit".
>
> If both states of the bit are not equally likely, then you do indeed
> have a fractional bit, since
>
>     nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2

That's an arithmetic mean of the logarithms. It doesn't imply that there
are fractional bits any more than an average family having 2.3 children
implies that there are 0.3 of a child wandering around the streets.

Using the Shannon measure of information, you can have messages which
contain fractional information (technically, "surprisal"), when measured
in bits. But that doesn't imply the existence of fractional bits. Look at
it this way: consider a barter economy where I agree to swap 5 chickens
for 2 axes. So each axe is equivalent to 2.5 chickens. But that doesn't
imply that there is such a thing as 0.5 of a chicken -- at least not a
*live* chicken. While I can blithely talk about bartering fractional
chickens, in practice when I actually go to make good on my promise, it
must be an integer number of chickens.

Similarly, we can talk about messages containing fractional bits of
information, but when we actually store or transmit that message in
practice, we can only use integer numbers of bits.

As Wikipedia puts it:

It is important to differentiate between the use of "bit" in referring to
a discrete storage unit and the use of "bit" in referring to a
statistical unit of information. The bit, as a discrete storage unit, can
by definition store only 0 or 1. A statistical bit is the amount of
information that, on average[citation needed], can be stored in a
discrete bit. ... If these two ideas need to be distinguished, sometimes
the name bit is used when discussing data storage while shannon is used
for the statistical bit.

http://en.wikipedia.org/wiki/Bit

--
Steven

```