xor: how come so slow?

Wed Oct 15 08:24:09 EDT 2008

On Oct 15, 10:19 pm, Michele <mich... at nectarine.it> wrote:
> Hi,
> I'm trying to encode a byte data. Let's not focus on the process of
> encoding; in fact, I want to emphasize that the method
> create_random_block takes 0.5s to be executed (even Java it's faster) on
> a Dual-Core 3.0Ghz machine:
>
> took 46.746999979s, avg: 0.46746999979s
>
> Thus I suppose that the xor operation between bytes raise the execution
> time to 0.5; why I suppose that?
> Because in Python there's no support for bytes and even for xoring
> bytes, so I used a workaround:
> I cycle on the two strings to be xor-red
>     for every char in the strings
>         convert one char on integer and then xor them; (ord)
>         insert one char in the result, transforming the previous integer
> in char (chr)
>
> I suppose that ord() and char() are the main problems of my
> implementation, but I don't know either way to xor two bytes of data
> (bytes are represented as strings).
> For more information, see the code attached.
>
> How should I decrease the execution time?
>
> Thank you
>
> from __future__ import division
> import random
> import time
> import sha
> import os
>
> class Encoder(object):
>     def create_random_block(self, data, seed, blocksize):
>         number_of_blocks = int(len(data)/blocksize)
>         random.seed(seed)
>         random_block = ['0'] * blocksize

You possibly mean '\0' i.e. the byte all of whose bits are zero.

>         for index in range(number_of_blocks):
>             if int(random.getrandbits(1)) == 1:

getrandbits(1) produces a *long* with one random bit. Any good reason
for preferring this to randrange(2) and randint(0, 1)?

So there's a 50% chance that this block will be XORed into the result;
is that what you intend?

>                 block = data[blocksize*index:blocksize*index+blocksize]

You don't need to slice out block, certainly not so awkwardly.

>                 for bit in range(len(block)):

Perhaps you mean "byte_index", not "bit".

On my assumption that range(len(block)) is invariant: calculate it
once. That assumption is incorrect, so is your code for calculating
the number of blocks; it ignores a possible short block at the end.

>                     random_block[bit] =
> chr(ord(random_block[bit])^ord(block[bit]))

The chr() and one ord() are utterly wasteful; leave random_block as a
list of ints and do the chr() thing in the return statement.

> # workaround per fare xor
> bit a bit di str; xor e' solo supportato per int -> ord
>         return ''.join(random_block)

this will become
   return ''.join(map(chr, random_block))
or
   return ''.join(chr(i) for i in random_block)
as taste or speed dictates :-)

So the whole thing becomes [not tested]:
def create_random_block(self, data, seed, blocksize):
    datalen = len(data)
    assert datalen % blocksize == 0
    random.seed(seed)
    random_block = [0] * blocksize
    block_range = range(blocksize)
    for start in xrange(0, datalen, blocksize):
        if random.randrange(2):
            for x in block_range:
                random_block[x] ^= ord(data[start + x])
    return ''.join(map(chr, random_block))

Looks slightly more athletic than before :-)

BTW, +1 misleading subject of the week; it's not XOR that's slow!!

Cheers,
John