Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?
Am 05.10.2013 21:42, schrieb Serhiy Storchaka:
Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?
This appears to be about changeset 499a96611baa: Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion). I think the best way to describe the CPython strategy is that we don't like to optimize things that both have an idiomatic solution already (see str.join) and can't be replicated easily in other implementations. cheers, Georg
On Sat, 05 Oct 2013 22:11:43 +0200
Georg Brandl
Am 05.10.2013 21:42, schrieb Serhiy Storchaka:
Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?
This appears to be about changeset 499a96611baa:
Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion).
I think the best way to describe the CPython strategy is that we don't like to optimize things that both have an idiomatic solution already (see str.join) and can't be replicated easily in other implementations.
Agreed with this. I'll also point out that algorithmic complexity is only an aspect of performance. For example, providing a fast C implementation of decimal is a game changer even though algorithmic complexity may have remained identical. Regards Antoine.
05.10.13 23:11, Georg Brandl написав(ла):
Am 05.10.2013 21:42, schrieb Serhiy Storchaka:
Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?
This appears to be about changeset 499a96611baa:
Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion).
I just need something to which I can refer in similar situations (there was several similar rejected proposition last months). I'm not sure issue #19087 is in same category as constant time += for strings. But the converse was not proven.
The str type is immutable, bytearray is not. It's easier to justify optimisations on mutable types, like overallocate lists for example. Victor
06.10.13 00:06, Victor Stinner написав(ла):
The str type is immutable, bytearray is not. It's easier to justify optimisations on mutable types, like overallocate lists for example.
We can resize str or bytes if its refcount is 1. And resize is cheap in some circumstances. This optimization is CPython-only.
On Sun, 06 Oct 2013 00:02:14 +0300
Serhiy Storchaka
05.10.13 23:11, Georg Brandl написав(ла):
Am 05.10.2013 21:42, schrieb Serhiy Storchaka:
Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?
This appears to be about changeset 499a96611baa:
Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion).
I just need something to which I can refer in similar situations (there was several similar rejected proposition last months).
Sometimes there are no hard rules, and several factors can combine into a decision. Complexity of implementation and existing alternatives are useful guidelines, IMHO. Optimized "+=" for strings has been re-enabled after PEP 393, IIRC. Regards Antoine.
On Oct 5, 2013, at 12:42 PM, Serhiy Storchaka
Please remember me, what was common decision about CPython-only optimizations which change computation complexity?
IIRC, it is okay optimize C code for just about anything, but we don't want to alter the pure python code after from idiomatic solutions that work on all of the implementations. For example, there is a string concatenation optimization in the C code, but we still use and advocate str.join() and try not to write code that relies on the optimization. That said, it is nice that less informed users are sometimes saved from an accidental performance trap. Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings. Right now, it is not too hard to roughly explain the performance characteristics of the various container objects, but that would be imperiled a bit by having bytearrays differing from strings in ways other than their size. Raymond
On Sat, 5 Oct 2013 15:35:30 -0700
Raymond Hettinger
Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.
Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics. Regards Antoine.
On 6 Oct 2013 08:59, "Antoine Pitrou"
On Sat, 5 Oct 2013 15:35:30 -0700 Raymond Hettinger
wrote: Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.
Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics.
I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison. Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising (particularly if it was CPython specific). Cheers, Nick.
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
On Sun, 6 Oct 2013 09:32:30 +1000
Nick Coghlan
On 6 Oct 2013 08:59, "Antoine Pitrou"
wrote: On Sat, 5 Oct 2013 15:35:30 -0700 Raymond Hettinger
wrote: Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.
Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics.
I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison.
Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising (particularly if it was CPython specific).
It's only strange because you don't understand the main use case for bytearrays. They may look like arrays of 8-bit integers but they are really used for buffers, so optimizing for stuff like FIFO operation makes sense. Regards Antoine.
Am 06.10.2013 01:37, schrieb Antoine Pitrou:
On Sun, 6 Oct 2013 09:32:30 +1000 Nick Coghlan
wrote: On 6 Oct 2013 08:59, "Antoine Pitrou"
wrote: On Sat, 5 Oct 2013 15:35:30 -0700 Raymond Hettinger
wrote: Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.
Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics.
I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison.
Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising (particularly if it was CPython specific).
It's only strange because you don't understand the main use case for bytearrays. They may look like arrays of 8-bit integers but they are really used for buffers, so optimizing for stuff like FIFO operation makes sense.
I agree, that is also what came to my mind when I read the commit message. Georg
06.10.13 02:37, Antoine Pitrou написав(ла):
It's only strange because you don't understand the main use case for bytearrays. They may look like arrays of 8-bit integers but they are really used for buffers, so optimizing for stuff like FIFO operation makes sense.
Could you please provide several examples of "the main use case for bytearrays"?
Nick Coghlan writes:
I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison. Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising
I don't see why. If you really wanted bytearrays to be similar to list or array.array in that way, wouldn't you derive them from one of those types[1] rather than provide a primitive? Having a separate primitive makes it easier to implement optimized performance characteristics. Isn't that why we have several different kinds of containers instead of deriving everything from list? For me, the point about string "+=" being efficient (sometimes) isn't that it is surprising compared to similar types, it's that it is surprising for any immutable sequence type. Footnotes: [1] Or all of them from an ABC.
On 2013-10-06, at 12:37 , Stephen J. Turnbull wrote:
For me, the point about string "+=" being efficient (sometimes) isn't that it is surprising compared to similar types, it's that it is surprising for any immutable sequence type.
It's clearly nitpicking, but ropes are immutable sequences with O(1) concatenation. Clojure's vectors also have a more-or-less-O(1) append (technically it's O(log32 n)) and one could implement a Python version of them.
Sorry, I may have missed some earlier relevant parts of this discussion. But you appear to be saying that you don't want to optimize something because it would be hard to explain why it performed better. Eh?? Have I misunderstood? Rob Cliffe On 05/10/2013 23:35, Raymond Hettinger wrote:
On Oct 5, 2013, at 12:42 PM, Serhiy Storchaka
mailto:storchaka@gmail.com> wrote: Please remember me, what was common decision about CPython-only optimizations which change computation complexity?
IIRC, it is okay optimize C code for just about anything, but we don't want to alter the pure python code after from idiomatic solutions that work on all of the implementations.
For example, there is a string concatenation optimization in the C code, but we still use and advocate str.join() and try not to write code that relies on the optimization. That said, it is nice that less informed users are sometimes saved from an accidental performance trap.
Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings. Right now, it is not too hard to roughly explain the performance characteristics of the various container objects, but that would be imperiled a bit by having bytearrays differing from strings in ways other than their size.
Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/rob.cliffe%40btinternet.c...
No virus found in this message. Checked by AVG - www.avg.com http://www.avg.com Version: 2012.0.2242 / Virus Database: 3222/6225 - Release Date: 10/05/13
participants (9)
-
Antoine Pitrou
-
Georg Brandl
-
Nick Coghlan
-
Raymond Hettinger
-
Rob Cliffe
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Victor Stinner
-
Xavier Morel