[Python-ideas] String and bytes bitwise operations

Thu May 17 09:13:22 EDT 2018

On Thu, May 17, 2018 at 03:49:02PM +0300, Serhiy Storchaka wrote:
> 17.05.18 15:20, Steven D'Aprano пише:
> >On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:
> >>17.05.18 13:53, Ken Hilton пише:
> >>>But it might be useful in some cases to (let's say) xor a string (or
> >>>bytestring):
> >>
> >>The question is how common a need of these operations? If it is not
> >>common enough, they are better be implemented as functions in a
> >>third-party library.
> >
> >The real question is, what does it mean to XOR a text string?
> 
> The OP explained this meaning with a sample implementation.

No, he didn't explain the meaning. He gave an example, but not a reason 
why it should do what he showed.

Why should the *abstract character* 'H' XORed with the abstract 
character 'w' return the abstract character '?'? Why shouldn't the 
result be '>' instead?

(For the record that's using 'EBCDIC-CP-BE'.)

The point is, XORing abstract characters seems meaningless to me. If the 
OP has an explanation for why 'H'^'?' must mean '?', he should explain 
it.

XORing code points could easily generate invalid Unicode sequences 
containing lone surrogates, say, or undefined characters. Or as you 
point out, out of range values.

But XORing bytes seems perfectly reasonable. Bytes are numbers, even if 
we display them as ASCII characters.

> >>Are you aware that this can raise a ValueError for some input strings?
> >>For example for '\U00100000' and '\U00010000'.
> >
> >That works for me.
> >
> >py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U00100000', 
> >'\U00100000'))
> >'\x00'
> 
> Try with '\U00100000' and '\U00010000', not with '\U00100000' and 
> '\U00100000'.

Oops, sorry. I misread your post.

-- 
Steve