bitwise operations for bytes and bytearray

Hi! I'm interested in adding the functionality to do something like:
b'a' ^ b'b' b'\x03'
Instead of the good ol' TypeError. I think both bytes and bytearray should support all the bitwise operations. I've never hacked on cpython before. I'm starting by just trying to add xor to bytearray. I have a ByteArray_Xor function that I think should do this here https://github.com/cowlicks/cpython/commit/d6dddb11cdb33032b39dcb9dfdaa7b10d... But I'm not sure how to hook this in to the rest of cypython. I tried adding it where bytearray_as_sequence is declared in this bytearrayobject.c file. But that gave me compiler warnings and broke things. So now that I have this ByteArray_Xor function, how do I make it be bytearray.__xor___? Thanks! Blake

On Thu, 7 Jan 2016 at 14:29 Blake Griffith <blake.a.griffith@gmail.com> wrote:
You need to set the PyNumberMethods struct with the appropriate function and then set that on the PyTypeObject. Look at https://hg.python.org/cpython/file/tip/Include/object.h#l237 and https://hg.python.org/cpython/file/tip/Objects/longobject.c#l5238 for an idea of what it takes.

You want to put the `xor` method in the `nb_xor` field of the `PyNumberMethods` structure that lives in the `tp_as_number` field of the bytes type object. Two things I noticed in a quick pass: you might want to add some type checking around the case where `a` or `b` is not a `PyByteArray` object. Also, variable length arrays are added in C99, so you will need to manually allocate the `raw_*` arrays in the heap. This seems like a cool feature! On Thu, Jan 7, 2016 at 5:26 PM, Blake Griffith <blake.a.griffith@gmail.com> wrote:

On 7 January 2016 at 22:26, Blake Griffith <blake.a.griffith@gmail.com> wrote:
There is a bug open about adding this kind of functionality: <https://bugs.python.org/issue19251>.

On Jan 7, 2016, at 15:57, Martin Panter <vadmium+py@gmail.com> wrote:
And it's in the needs patch stage, which makes it perfect for the OP: in addition to learning how to hack on builtin types, he can also learn the other parts of the dev process. (Even if the bug is eventually rejected, as seems likely given that it sat around for three years with no compelling use case and then Guido added a "very skeptical" comment.)

Thanks for the quick responses y'all. I have something compiling on my branch, which is enough for me tonight. I asked a question about this on stackoverflow a while ago, it wasn't very popular https://stackoverflow.com/questions/32658420/why-cant-you-xor-bytes-objects-... Someone there pointed out this feature was suggested on the mailing list a while back (2006) https://mail.python.org/pipermail/python-dev/2006-March/061980.html On Fri, Jan 8, 2016 at 1:12 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

On 07Jan2016 16:12, Python-Dev <python-dev@python.org> wrote:
The use case which springs immediately to my mind is cryptography. To encrypt a stream symmetrically you can go: cleartext-bytes ^ cryptographicly-random-bytes-from-cipher so with this one could write: def crypted(byteses, crypto_source): ''' Accept an iterable source of bytes objects and a preprimed source of crypto bytes, yield encrypted versions of the bytes objects. ''' for bs in byteses: cbs = crypto_source.next_bytes(len(bs)) yield bs ^ cbs Cheers, Cameron Simpson <cs@zip.com.au>

A little update, I got ^, &, and | working for bytearrays. You can view the diff here: https://github.com/python/cpython/compare/master...cowlicks:bitwise-bytes?ex... How does it look? Joe, is this how I should allocate the arrays? Am I freeing them properly? Am I checking the input enough? After some feedback, I'll probably add bitshifting and negation. Then work on bytes objects. Does this warrant a pep? On Fri, Jan 8, 2016 at 2:08 AM, Cameron Simpson <cs@zip.com.au> wrote:

On Jan 9, 2016, at 16:17, Blake Griffith <blake.a.griffith@gmail.com> wrote:
A little update, I got ^, &, and | working for bytearrays. You can view the diff here: https://github.com/python/cpython/compare/master...cowlicks:bitwise-bytes?ex...
If you upload the diff to the issue on the tracker, the reitveld code review app should be able to pick it up automatically, allowing people to comment on it inline, in a much nicer format than a mailing list thread. It's especially nice if you're adding things in stages--people who have been following along can just look at the changes between patch 3 and 4, while new people can look at all the changes in one go, etc.
Personally, I'd just make the case for the feature on the tracker issue. If one of the core devs thinks it needs a PEP, or further discussion on this list or -ideas, they'll say so there. At present, it seems like there's not much support for the idea, but I think that's at least partly because people want to see realistic use cases (that aren't server better by the existing bitarray/bitstring/etc. modules on PyPI, or using a NumPy array, or just using ints, etc.).

On Thu, 7 Jan 2016 at 14:29 Blake Griffith <blake.a.griffith@gmail.com> wrote:
You need to set the PyNumberMethods struct with the appropriate function and then set that on the PyTypeObject. Look at https://hg.python.org/cpython/file/tip/Include/object.h#l237 and https://hg.python.org/cpython/file/tip/Objects/longobject.c#l5238 for an idea of what it takes.

You want to put the `xor` method in the `nb_xor` field of the `PyNumberMethods` structure that lives in the `tp_as_number` field of the bytes type object. Two things I noticed in a quick pass: you might want to add some type checking around the case where `a` or `b` is not a `PyByteArray` object. Also, variable length arrays are added in C99, so you will need to manually allocate the `raw_*` arrays in the heap. This seems like a cool feature! On Thu, Jan 7, 2016 at 5:26 PM, Blake Griffith <blake.a.griffith@gmail.com> wrote:

On 7 January 2016 at 22:26, Blake Griffith <blake.a.griffith@gmail.com> wrote:
There is a bug open about adding this kind of functionality: <https://bugs.python.org/issue19251>.

On Jan 7, 2016, at 15:57, Martin Panter <vadmium+py@gmail.com> wrote:
And it's in the needs patch stage, which makes it perfect for the OP: in addition to learning how to hack on builtin types, he can also learn the other parts of the dev process. (Even if the bug is eventually rejected, as seems likely given that it sat around for three years with no compelling use case and then Guido added a "very skeptical" comment.)

Thanks for the quick responses y'all. I have something compiling on my branch, which is enough for me tonight. I asked a question about this on stackoverflow a while ago, it wasn't very popular https://stackoverflow.com/questions/32658420/why-cant-you-xor-bytes-objects-... Someone there pointed out this feature was suggested on the mailing list a while back (2006) https://mail.python.org/pipermail/python-dev/2006-March/061980.html On Fri, Jan 8, 2016 at 1:12 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

On 07Jan2016 16:12, Python-Dev <python-dev@python.org> wrote:
The use case which springs immediately to my mind is cryptography. To encrypt a stream symmetrically you can go: cleartext-bytes ^ cryptographicly-random-bytes-from-cipher so with this one could write: def crypted(byteses, crypto_source): ''' Accept an iterable source of bytes objects and a preprimed source of crypto bytes, yield encrypted versions of the bytes objects. ''' for bs in byteses: cbs = crypto_source.next_bytes(len(bs)) yield bs ^ cbs Cheers, Cameron Simpson <cs@zip.com.au>

A little update, I got ^, &, and | working for bytearrays. You can view the diff here: https://github.com/python/cpython/compare/master...cowlicks:bitwise-bytes?ex... How does it look? Joe, is this how I should allocate the arrays? Am I freeing them properly? Am I checking the input enough? After some feedback, I'll probably add bitshifting and negation. Then work on bytes objects. Does this warrant a pep? On Fri, Jan 8, 2016 at 2:08 AM, Cameron Simpson <cs@zip.com.au> wrote:

On Jan 9, 2016, at 16:17, Blake Griffith <blake.a.griffith@gmail.com> wrote:
A little update, I got ^, &, and | working for bytearrays. You can view the diff here: https://github.com/python/cpython/compare/master...cowlicks:bitwise-bytes?ex...
If you upload the diff to the issue on the tracker, the reitveld code review app should be able to pick it up automatically, allowing people to comment on it inline, in a much nicer format than a mailing list thread. It's especially nice if you're adding things in stages--people who have been following along can just look at the changes between patch 3 and 4, while new people can look at all the changes in one go, etc.
Personally, I'd just make the case for the feature on the tracker issue. If one of the core devs thinks it needs a PEP, or further discussion on this list or -ideas, they'll say so there. At present, it seems like there's not much support for the idea, but I think that's at least partly because people want to see realistic use cases (that aren't server better by the existing bitarray/bitstring/etc. modules on PyPI, or using a NumPy array, or just using ints, etc.).
participants (6)
-
Andrew Barnert
-
Blake Griffith
-
Brett Cannon
-
Cameron Simpson
-
Joe Jevnik
-
Martin Panter