xor: how come so slow?
John Machin
sjmachin at lexicon.net
Wed Oct 15 08:24:09 EDT 2008
On Oct 15, 10:19 pm, Michele <mich... at nectarine.it> wrote:
> Hi,
> I'm trying to encode a byte data. Let's not focus on the process of
> encoding; in fact, I want to emphasize that the method
> create_random_block takes 0.5s to be executed (even Java it's faster) on
> a Dual-Core 3.0Ghz machine:
>
> took 46.746999979s, avg: 0.46746999979s
>
> Thus I suppose that the xor operation between bytes raise the execution
> time to 0.5; why I suppose that?
> Because in Python there's no support for bytes and even for xoring
> bytes, so I used a workaround:
> I cycle on the two strings to be xor-red
> for every char in the strings
> convert one char on integer and then xor them; (ord)
> insert one char in the result, transforming the previous integer
> in char (chr)
>
> I suppose that ord() and char() are the main problems of my
> implementation, but I don't know either way to xor two bytes of data
> (bytes are represented as strings).
> For more information, see the code attached.
>
> How should I decrease the execution time?
>
> Thank you
>
> from __future__ import division
> import random
> import time
> import sha
> import os
>
> class Encoder(object):
> def create_random_block(self, data, seed, blocksize):
> number_of_blocks = int(len(data)/blocksize)
> random.seed(seed)
> random_block = ['0'] * blocksize
You possibly mean '\0' i.e. the byte all of whose bits are zero.
> for index in range(number_of_blocks):
> if int(random.getrandbits(1)) == 1:
getrandbits(1) produces a *long* with one random bit. Any good reason
for preferring this to randrange(2) and randint(0, 1)?
So there's a 50% chance that this block will be XORed into the result;
is that what you intend?
> block = data[blocksize*index:blocksize*index+blocksize]
You don't need to slice out block, certainly not so awkwardly.
> for bit in range(len(block)):
Perhaps you mean "byte_index", not "bit".
On my assumption that range(len(block)) is invariant: calculate it
once. That assumption is incorrect, so is your code for calculating
the number of blocks; it ignores a possible short block at the end.
> random_block[bit] =
> chr(ord(random_block[bit])^ord(block[bit]))
The chr() and one ord() are utterly wasteful; leave random_block as a
list of ints and do the chr() thing in the return statement.
> # workaround per fare xor
> bit a bit di str; xor e' solo supportato per int -> ord
> return ''.join(random_block)
this will become
return ''.join(map(chr, random_block))
or
return ''.join(chr(i) for i in random_block)
as taste or speed dictates :-)
So the whole thing becomes [not tested]:
def create_random_block(self, data, seed, blocksize):
datalen = len(data)
assert datalen % blocksize == 0
random.seed(seed)
random_block = [0] * blocksize
block_range = range(blocksize)
for start in xrange(0, datalen, blocksize):
if random.randrange(2):
for x in block_range:
random_block[x] ^= ord(data[start + x])
return ''.join(map(chr, random_block))
Looks slightly more athletic than before :-)
BTW, +1 misleading subject of the week; it's not XOR that's slow!!
Cheers,
John
More information about the Python-list
mailing list