[Python-ideas] String and bytes bitwise operations
Steven D'Aprano
steve at pearwood.info
Thu May 17 08:20:43 EDT 2018
On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:
> 17.05.18 13:53, Ken Hilton пише:
> >We all know the bitwise operators: & (and), | (or), ^ (xor), and ~
> >(not). We know how they work with numbers:
> >
> >420 ^ 502
> >
> >110100100
> >111110110
> >== XOR ==
> >001010010
> >= 82
> >
> >But it might be useful in some cases to (let's say) xor a string (or
> >bytestring):
>
> The question is how common a need of these operations? If it is not
> common enough, they are better be implemented as functions in a
> third-party library.
The real question is, what does it mean to XOR a text string?
What is 'π' XOR '≠'? I can think of at least three possibilities:
'-\t' # treat the strings as UTF-8
'↠' # treat them as UTF-16 or UTF-32
'\x14' # treat them as MacRoman
What if the strings are unequal lengths?
But XORing equal length byte strings makes sense to me. There's no
ambiguity: it is equivalent to just XORing each byte with the
corresponding byte.
> >Currently, that's done with this expression for strings:
> >
> > >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
> > '?*> +'
>
> Are you aware that this can raise a ValueError for some input strings?
> For example for '\U00100000' and '\U00010000'.
That works for me.
py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U00100000', '\U00100000'))
'\x00'
--
Steve
More information about the Python-ideas
mailing list