[Python-ideas] String and bytes bitwise operations

Antoine Pitrou solipsis at pitrou.net
Thu May 17 11:11:45 EDT 2018


I agree with Steven.  XORing unicode strings doesn't make sense, and is
pointless anyway.  The only interesting question is whether we want to
add bytewise operations to the stdlib.

Regards

Antoine.


On Thu, 17 May 2018 23:13:22 +1000
Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, May 17, 2018 at 03:49:02PM +0300, Serhiy Storchaka wrote:
> > 17.05.18 15:20, Steven D'Aprano пише:  
> > >On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:  
> > >>17.05.18 13:53, Ken Hilton пише:  
> > >>>But it might be useful in some cases to (let's say) xor a string (or
> > >>>bytestring):  
> > >>
> > >>The question is how common a need of these operations? If it is not
> > >>common enough, they are better be implemented as functions in a
> > >>third-party library.  
> > >
> > >The real question is, what does it mean to XOR a text string?  
> > 
> > The OP explained this meaning with a sample implementation.  
> 
> No, he didn't explain the meaning. He gave an example, but not a reason 
> why it should do what he showed.
> 
> Why should the *abstract character* 'H' XORed with the abstract 
> character 'w' return the abstract character '?'? Why shouldn't the 
> result be '>' instead?
> 
> (For the record that's using 'EBCDIC-CP-BE'.)
> 
> The point is, XORing abstract characters seems meaningless to me. If the 
> OP has an explanation for why 'H'^'?' must mean '?', he should explain 
> it.
> 
> XORing code points could easily generate invalid Unicode sequences 
> containing lone surrogates, say, or undefined characters. Or as you 
> point out, out of range values.
> 
> 
> But XORing bytes seems perfectly reasonable. Bytes are numbers, even if 
> we display them as ASCII characters.
> 
> 
> 
> > >>Are you aware that this can raise a ValueError for some input strings?
> > >>For example for '\U00100000' and '\U00010000'.  
> > >
> > >That works for me.
> > >  
> > >py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U00100000',   
> > >'\U00100000'))
> > >'\x00'  
> > 
> > Try with '\U00100000' and '\U00010000', not with '\U00100000' and 
> > '\U00100000'.  
> 
> Oops, sorry. I misread your post.
> 
> 





More information about the Python-ideas mailing list