Mailman 3 Optimization - Python-Dev

Optimization

Serhiy Storchaka

5 Oct 2013 5 Oct '13

7:42 p.m.

Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?

Show replies by date

Georg Brandl

5 Oct 5 Oct

8:11 p.m.

Am 05.10.2013 21:42, schrieb Serhiy Storchaka:

...

Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?

This appears to be about changeset 499a96611baa: Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion). I think the best way to describe the CPython strategy is that we don't like to optimize things that both have an idiomatic solution already (see str.join) and can't be replicated easily in other implementations. cheers, Georg

Antoine Pitrou

8:19 p.m.

On Sat, 05 Oct 2013 22:11:43 +0200 Georg Brandl wrote:

...

Am 05.10.2013 21:42, schrieb Serhiy Storchaka:

...
Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?

This appears to be about changeset 499a96611baa:

Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion).

I think the best way to describe the CPython strategy is that we don't like to optimize things that both have an idiomatic solution already (see str.join) and can't be replicated easily in other implementations.

Agreed with this. I'll also point out that algorithmic complexity is only an aspect of performance. For example, providing a fast C implementation of decimal is a game changer even though algorithmic complexity may have remained identical. Regards Antoine.

Serhiy Storchaka

9:02 p.m.

05.10.13 23:11, Georg Brandl написав(ла):

...

Am 05.10.2013 21:42, schrieb Serhiy Storchaka:

...
Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?

This appears to be about changeset 499a96611baa:

Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion).

I just need something to which I can refer in similar situations (there was several similar rejected proposition last months). I'm not sure issue #19087 is in same category as constant time += for strings. But the converse was not proven.

Victor Stinner

9:06 p.m.

The str type is immutable, bytearray is not. It's easier to justify optimisations on mutable types, like overallocate lists for example. Victor

Serhiy Storchaka

9:58 p.m.

06.10.13 00:06, Victor Stinner написав(ла):

...

The str type is immutable, bytearray is not. It's easier to justify optimisations on mutable types, like overallocate lists for example.

We can resize str or bytes if its refcount is 1. And resize is cheap in some circumstances. This optimization is CPython-only.

Antoine Pitrou

9:18 p.m.

On Sun, 06 Oct 2013 00:02:14 +0300 Serhiy Storchaka wrote:

...

05.10.13 23:11, Georg Brandl написав(ла):

...
Am 05.10.2013 21:42, schrieb Serhiy Storchaka:

...
Please remember me, what was common decision about CPython-only optimizations which change computation complexity? I.g. constant amortization time of += for byte objects and strings, or linear time of sum() for sequences?

This appears to be about changeset 499a96611baa:

Issue #19087: Improve bytearray allocation in order to allow cheap popping of data at the front (slice deletion).

I just need something to which I can refer in similar situations (there was several similar rejected proposition last months).

Sometimes there are no hard rules, and several factors can combine into a decision. Complexity of implementation and existing alternatives are useful guidelines, IMHO. Optimized "+=" for strings has been re-enabled after PEP 393, IIRC. Regards Antoine.

Raymond Hettinger

10:35 p.m.

On Oct 5, 2013, at 12:42 PM, Serhiy Storchaka wrote:

...

Please remember me, what was common decision about CPython-only optimizations which change computation complexity?

IIRC, it is okay optimize C code for just about anything, but we don't want to alter the pure python code after from idiomatic solutions that work on all of the implementations. For example, there is a string concatenation optimization in the C code, but we still use and advocate str.join() and try not to write code that relies on the optimization. That said, it is nice that less informed users are sometimes saved from an accidental performance trap. Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings. Right now, it is not too hard to roughly explain the performance characteristics of the various container objects, but that would be imperiled a bit by having bytearrays differing from strings in ways other than their size. Raymond

Antoine Pitrou

10:58 p.m.

On Sat, 5 Oct 2013 15:35:30 -0700 Raymond Hettinger wrote:

...

Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.

Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics. Regards Antoine.

Nick Coghlan

11:32 p.m.

On 6 Oct 2013 08:59, "Antoine Pitrou" wrote:

...

On Sat, 5 Oct 2013 15:35:30 -0700 Raymond Hettinger wrote:

...
Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.

Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics.

I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison. Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising (particularly if it was CPython specific). Cheers, Nick.

...

Regards

Antoine.

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:

https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Antoine Pitrou

11:37 p.m.

On Sun, 6 Oct 2013 09:32:30 +1000 Nick Coghlan wrote:

...

On 6 Oct 2013 08:59, "Antoine Pitrou" wrote:

...
On Sat, 5 Oct 2013 15:35:30 -0700 Raymond Hettinger wrote:

...
Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.

Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics.

I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison.

Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising (particularly if it was CPython specific).

It's only strange because you don't understand the main use case for bytearrays. They may look like arrays of 8-bit integers but they are really used for buffers, so optimizing for stuff like FIFO operation makes sense. Regards Antoine.

Georg Brandl

6 Oct 6 Oct

6:38 a.m.

Am 06.10.2013 01:37, schrieb Antoine Pitrou:

...

On Sun, 6 Oct 2013 09:32:30 +1000 Nick Coghlan wrote:

...
On 6 Oct 2013 08:59, "Antoine Pitrou" wrote:

...
On Sat, 5 Oct 2013 15:35:30 -0700 Raymond Hettinger wrote:

...
Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings.

Bytearrays are mutable and strings are immutable, so evidently they will have different performance characteristics.

I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison.

Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising (particularly if it was CPython specific).

It's only strange because you don't understand the main use case for bytearrays. They may look like arrays of 8-bit integers but they are really used for buffers, so optimizing for stuff like FIFO operation makes sense.

I agree, that is also what came to my mind when I read the commit message. Georg

Serhiy Storchaka

6:56 a.m.

06.10.13 02:37, Antoine Pitrou написав(ла):

...

It's only strange because you don't understand the main use case for bytearrays. They may look like arrays of 8-bit integers but they are really used for buffers, so optimizing for stuff like FIFO operation makes sense.

Could you please provide several examples of "the main use case for bytearrays"?

Stephen J. Turnbull

10:37 a.m.

Nick Coghlan writes:

...

I suspect "list" may have been the intended comparison there. "array.array" is another appropriate comparison. Having bytearray operations differ in algorithmic complexity from those two types would be very strange and surprising

I don't see why. If you really wanted bytearrays to be similar to list or array.array in that way, wouldn't you derive them from one of those types[1] rather than provide a primitive? Having a separate primitive makes it easier to implement optimized performance characteristics. Isn't that why we have several different kinds of containers instead of deriving everything from list? For me, the point about string "+=" being efficient (sometimes) isn't that it is surprising compared to similar types, it's that it is surprising for any immutable sequence type. Footnotes: [1] Or all of them from an ABC.

Xavier Morel

4:15 p.m.

On 2013-10-06, at 12:37 , Stephen J. Turnbull wrote:

...

For me, the point about string "+=" being efficient (sometimes) isn't that it is surprising compared to similar types, it's that it is surprising for any immutable sequence type.

It's clearly nitpicking, but ropes are immutable sequences with O(1) concatenation. Clojure's vectors also have a more-or-less-O(1) append (technically it's O(log32 n)) and one could implement a Python version of them.

Rob Cliffe

5 Oct 5 Oct

10:58 p.m.

Sorry, I may have missed some earlier relevant parts of this discussion. But you appear to be saying that you don't want to optimize something because it would be hard to explain why it performed better. Eh?? Have I misunderstood? Rob Cliffe On 05/10/2013 23:35, Raymond Hettinger wrote:

...

On Oct 5, 2013, at 12:42 PM, Serhiy Storchaka mailto:storchaka@gmail.com> wrote:

...
Please remember me, what was common decision about CPython-only optimizations which change computation complexity?

IIRC, it is okay optimize C code for just about anything, but we don't want to alter the pure python code after from idiomatic solutions that work on all of the implementations.

For example, there is a string concatenation optimization in the C code, but we still use and advocate str.join() and try not to write code that relies on the optimization. That said, it is nice that less informed users are sometimes saved from an accidental performance trap.

Making bytearray's efficiently pop from the left side is dubious. This isn't a common idiom, nor should it be. Even if all the other implementations could model this behavior, it wouldn't be a good idea to have bytearrays have different performance characteristics than strings. Right now, it is not too hard to roughly explain the performance characteristics of the various container objects, but that would be imperiled a bit by having bytearrays differing from strings in ways other than their size.

Raymond

_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/rob.cliffe%40btinternet.c...

No virus found in this message. Checked by AVG - www.avg.com http://www.avg.com Version: 2012.0.2242 / Virus Database: 3222/6225 - Release Date: 10/05/13

3854

Age (days ago)

3855

Last active (days ago)

List overview

Download

15 comments

9 participants

participants (9)

Antoine Pitrou
Georg Brandl
Nick Coghlan
Raymond Hettinger
Rob Cliffe
Serhiy Storchaka
Stephen J. Turnbull
Victor Stinner
Xavier Morel

Optimization

tags

participants (9)