I was checking the documentation [1] to see where to put the new information about bytes and bytearray %-formatting, and noticed that /every/ operation that could modify a bytearray object in place (e.g. center, capitalize, strip, etc.) instead returns a new copy. The section just prior to that [2] does say, "As bytearray objects are mutable, they support the mutable sequence operations ...". So it seems that when bytearray is treated as ascii data a new bytearry is returned, and when bytearray is treated as a container it is modified in place: New: bytearray(b'Chapter Title').center() bytearray(b' Chapter Title ').replace(b' ', b'- * - ') In-place: bytearray(b'abc'][1] = ord(b'z') bytearray(b'def'] += b'ghi' bytearray(b'123'] *= 3 I now have a minor dilemma: %-formatting is an ascii operation, but it also has an in-place operator (%=) . . . so does %= modify the bytearray in place just like += and *= do, or does it return a new bytearray just like all the named ascii operations do? I do not know which is more surprising: having one of the in-place operators not work in place, or having an unnamed ascii-operation not return a new copy. Thoughts? -- ~Ethan~ [1] https://docs.python.org/3/library/stdtypes.html#bytes-and-bytearray-operatio... [2] https://docs.python.org/3/library/stdtypes.html#bytearray-objects
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/19/2015 11:36 AM, Ethan Furman wrote:
I was checking the documentation [1] to see where to put the new information about bytes and bytearray %-formatting, and noticed that /every/ operation that could modify a bytearray object in place (e.g. center, capitalize, strip, etc.) instead returns a new copy.
The section just prior to that [2] does say, "As bytearray objects are mutable, they support the mutable sequence operations ...".
So it seems that when bytearray is treated as ascii data a new bytearry is returned, and when bytearray is treated as a container it is modified in place:
New:
bytearray(b'Chapter Title').center() bytearray(b' Chapter Title ').replace(b' ', b'- * - ')
In-place:
bytearray(b'abc'][1] = ord(b'z') bytearray(b'def'] += b'ghi' bytearray(b'123'] *= 3
I now have a minor dilemma: %-formatting is an ascii operation, but it also has an in-place operator (%=) . . . so does %= modify the bytearray in place just like += and *= do, or does it return a new bytearray just like all the named ascii operations do? I do not know which is more surprising: having one of the in-place operators not work in place, or having an unnamed ascii-operation not return a new copy.
Thoughts?
I'd return a new copy (or, not implement it at all and let the default machinery call __mod__, and do the assignment). That's the least surprising to me, since the goal is to be compatible with 2.7's str %-formatting. - -- Eric. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJUvThwAAoJENxauZFcKtNx/noIAIfIW2Ir4P9bQ9s7bdyJCvMj RlqfBhWVOjWZ52gK/CdTvpZZDlDUf+gh1JkbvggNvIQHUTy0IY9IxryWvNemfSce nzZxUhNKzx77f/oGTeGgqBBMZGvCsRCqYFbLBME5LDweiHAIL2MVGqF/SwkobrGq ruJjSBtNAl28IgpDj7kM7tT7/iXINk6XkVatNa2OxV2FOyYiIz+7Vs2HGpkltzhW g4qqGEvEpLOl1oRtmI/A3TDFjQgyHc1MKVax6PH/Nq2OMfnoS4hg+jTAzE6Eairh SmWyZUMjpTeHpCmUgx03WLh8iRTokfE2LN2KJuBN18iAT5EqC6sNTKZm9HX5odw= =D9zy -----END PGP SIGNATURE-----
On 01/19/2015 09:01 AM, Eric V. Smith wrote:
I'd return a new copy (or, not implement it at all and let the default machinery call __mod__, and do the assignment). That's the least surprising to me, since the goal is to be compatible with 2.7's str %-formatting.
That certainly feels like the winning argument. :) -- ~Ethan~
On Mon, 19 Jan 2015 12:01:36 -0500
"Eric V. Smith"
I'd return a new copy (or, not implement it at all and let the default machinery call __mod__, and do the assignment). That's the least surprising to me, since the goal is to be compatible with 2.7's str %-formatting.
+1 for a copy. There's no reasonable use case to make the formatting in-place, and it wouldn't buy any significant performance win anyway. Regards Antoine.
Hello,
On Mon, 19 Jan 2015 08:36:55 -0800
Ethan Furman
I was checking the documentation [1] to see where to put the new information about bytes and bytearray %-formatting, and noticed that /every/ operation that could modify a bytearray object in place (e.g. center, capitalize, strip, etc.) instead returns a new copy.
Well, those "operations" come from string methods. String methods always return a copy, so I'm not sure the same methods, applied to bytearray *could* reasonably be inplace operations, without surprising user a lot. But at the same, a usecase of inplace operations on bytearrays is valid, and I'd like to take a chance to branch the topic to discuss how they possibly could be implemented.
The section just prior to that [2] does say, "As bytearray objects are mutable, they support the mutable sequence operations ...".
So it seems that when bytearray is treated as ascii data a new bytearry is returned, and when bytearray is treated as a container it is modified in place:
Per above, I'd formulate it differently: methods inherited from string always return a new instance, while some *operators* modify it inplace (and yes, those are usually the same operators that modify other containers inplace). So, suppose there's a requirement to support inplace operations (methods) on bytearray, what would be Pythonic way to implement it? Something like: b.lower_inplace() b.lower_i() , or maybe import bytearray_ops bytearray_ops.lower(b) ?
New:
bytearray(b'Chapter Title').center() bytearray(b' Chapter Title ').replace(b' ', b'- * - ')
In-place:
bytearray(b'abc'][1] = ord(b'z') bytearray(b'def'] += b'ghi' bytearray(b'123'] *= 3
-- Best regards, Paul mailto:pmiscml@gmail.com
On Mon, Jan 19, 2015 at 11:43 AM, Paul Sokolovsky
[...] So, suppose there's a requirement to support inplace operations (methods) on bytearray, what would be Pythonic way to implement it?
Something like:
b.lower_inplace() b.lower_i()
, or maybe
import bytearray_ops bytearray_ops.lower(b)
?
Please don't go there. The use cases are too rare. -- --Guido van Rossum (python.org/~guido)
Hello,
On Mon, 19 Jan 2015 14:03:20 -0800
Guido van Rossum
On Mon, Jan 19, 2015 at 11:43 AM, Paul Sokolovsky
wrote: [...] So, suppose there's a requirement to support inplace operations (methods) on bytearray, what would be Pythonic way to implement it?
Something like:
b.lower_inplace() b.lower_i()
, or maybe
import bytearray_ops bytearray_ops.lower(b)
?
Please don't go there. The use cases are too rare.
recv_into() an HTTP request and .lower() it inplace to ease parsing? But I mostly raise that with MicroPython hat on, there it may be not necessarily superfluous. But then, having a separate module for such operations doesn't seem too bad, except that it would be retrograde action after "string" module deprecation in favor of methods. Anyway, I targetted that question for python-ideas on some easy day, and just took a quick chance here seeing someone else talking about inplace bytearray operations ;-).
-- --Guido van Rossum (python.org/~guido)
-- Best regards, Paul mailto:pmiscml@gmail.com
Hello,
On Tue, 20 Jan 2015 18:15:02 +1300
Greg Ewing
Guido van Rossum wrote:
On Mon, Jan 19, 2015 at 11:43 AM, Paul Sokolovsky
mailto:pmiscml@gmail.com> wrote: b.lower_inplace() b.lower_i()
Please don't go there. The use cases are too rare.
And if you have such a use case, it's not too hard to do
b[:] = b.lower()
The point of inplace operations (memoryview's, other stuff already in Python) is to avoid unneeded memory allocation and copying. For 1Tb bytearray with 1Tb of RAM, it will be very hard to do. (Ditto for 100K bytearray with 150K RAM.) -- Best regards, Paul mailto:pmiscml@gmail.com
On Tue, Jan 20, 2015 at 1:48 AM, Paul Sokolovsky
The point of inplace operations (memoryview's, other stuff already in Python) is to avoid unneeded memory allocation and copying. For 1Tb bytearray with 1Tb of RAM, it will be very hard to do. (Ditto for 100K bytearray with 150K RAM.)
So you'll have to figure a better way to do it. We're not going to add inplace_lower(), and that's the final word. (Of course you can add it to microPython as an extension of the language.) -- --Guido van Rossum (python.org/~guido)
On Tue, Jan 20, 2015 at 11:48:10AM +0200, Paul Sokolovsky wrote:
Hello,
On Tue, 20 Jan 2015 18:15:02 +1300 Greg Ewing
wrote: Guido van Rossum wrote:
On Mon, Jan 19, 2015 at 11:43 AM, Paul Sokolovsky
mailto:pmiscml@gmail.com> wrote: b.lower_inplace() b.lower_i()
Please don't go there. The use cases are too rare.
And if you have such a use case, it's not too hard to do
b[:] = b.lower()
The point of inplace operations (memoryview's, other stuff already in Python) is to avoid unneeded memory allocation and copying. For 1Tb bytearray with 1Tb of RAM, it will be very hard to do. (Ditto for 100K bytearray with 150K RAM.)
You can just loop through the bytearray and assign elements. I use something along the lines of this for PyParallel where I'm operating on bytearrays that are backed by underlying socket buffers, where I don't want to do any memory allocations/reallocations: def toupper_bytes(data): assert isinstance(data, bytearray) a = ord('a') z = ord('z') for i in range(0, len(data)): c = data[i] if c >= a and c <= z: data[i] = c - 32 Low overhead, mostly stays within the same ceval frame. Should be a walk in the park for PyPy, Cython or Numba to optimize, too. Trent.
participants (7)
-
Antoine Pitrou
-
Eric V. Smith
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Paul Sokolovsky
-
Trent Nelson