[Python-ideas] String and bytes bitwise operations

Steven D'Aprano steve at pearwood.info
Thu May 17 08:20:43 EDT 2018


On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:
> 17.05.18 13:53, Ken Hilton пише:
> >We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ 
> >(not). We know how they work with numbers:
> >
> >420 ^ 502
> >
> >110100100
> >111110110
> >== XOR ==
> >001010010
> >= 82
> >
> >But it might be useful in some cases to (let's say) xor a string (or 
> >bytestring):
> 
> The question is how common a need of these operations? If it is not 
> common enough, they are better be implemented as functions in a 
> third-party library.

The real question is, what does it mean to XOR a text string?

What is 'π' XOR '≠'? I can think of at least three possibilities:

'-\t'  # treat the strings as UTF-8
'↠'  # treat them as UTF-16 or UTF-32
'\x14' # treat them as MacRoman

What if the strings are unequal lengths?


But XORing equal length byte strings makes sense to me. There's no 
ambiguity: it is equivalent to just XORing each byte with the 
corresponding byte.


 
> >Currently, that's done with this expression for strings:
> >
> >     >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
> >     '?*> +'
> 
> Are you aware that this can raise a ValueError for some input strings? 
> For example for '\U00100000' and '\U00010000'.

That works for me.

py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U00100000', '\U00100000'))
'\x00'


-- 
Steve



More information about the Python-ideas mailing list