
Hello, On Mon, 30 Mar 2020 09:58:32 -0700 Brett Cannon <brett@python.org> wrote:
On Sun, Mar 29, 2020 at 10:58 AM Paul Sokolovsky <pmiscml@gmail.com> wrote:
[SNIP]
1. Succumb to applying the same mis-optimization for string type as CPython3. (With the understanding that for speed-optimized projects, implementing mis-optimizations will eat into performance budget, and for memory-optimized projects, it likely will lead to noticeable memory bloat.) [SNIP]
1. The biggest "criticism" I see is a response a-la "there's no problem with CPython3, so there's nothing to fix". This is related to a bigger questions "whether a life outside CPython exists", or put more formally, where's the border between Python-the-language and CPython-the-implementation. To address this point, I tried to collect performance stats for a pretty wide array of Python implementations.
I don't think characterizing this as a "mis-optimization" is fair. There is use of in-place add with strings in the wild and CPython happens to be able to optimize for it.
Everyone definitely doesn't have to agree with that characterization. Nor there's strong need to be offended that it's "unfair". After all, it's just somebody's opinion. Roughly speaking, the need to be upset by the "mis-" prefix is about the same as need to be upset by "bad" in some random blog post, e.g. https://snarky.ca/my-impressions-of-elm/ I'm also sure that people familiar with implementation details would understand why that "mis-" prefix, but let me be explicit otherwise: a string is one of the fundamental types in many languages, including Python. And trying to make it too many things at once has its overheads. Roughly speaking, to support efficient appending, one need to be ready to over-allocate string storage, and maintain bookkeeping for this. Another known optimization CPython does is for stuff like "s = s[off:]", which requires maintaining another "offset" pointer. Even with this simplistic consideration, internal structure of "str" would be about the same as "io.StringIO" (which also needs to over-allocate and maintain "current offset" pointer). But why, if there's io.StringIO in the first place?
Someone was motivated to do the optimization so we took it without hurting performance for other things. There are plenty of other things that I see people regularly that I don't personally think is best practices but that doesn't mean we should automatically ignore them and not help make their code more performant if possible without sacrificing best practice performance.
Nowhere did I argue against applying that optimization in CPython. Surely, in general, the more optimizations, the better. I just stated the fact that of 8 (well, 11, 11!) Python'ish implementations surveyed, only 1 implemented it. And what went implied, is that even under ideal conditions that other implementations say "we have resources to implement and maintain that optimization" (we still talking about "str +=" optimization), then at least for some projects, it would be against their interests. E.g. MicroPython, Pycopy, Snek optimize for memory usage, TinyPy for simplicity of implementation. "Too-complex basic types" are also a known problem for JITs (which become less performant due to need to handle multiple cases of the same primitive type and much harder to develop and debug). At the same time, ergonomics of "str +=" is very good (heck, that's why people use it). So, I was looking for the simplest possible change which would allow for the largest part of that ergonomics in an object type more suitable for content accumulation *across* different Python'ish implementations. I have to admit that I was inspired to write down this RFC by PEP 616 "String methods to remove prefixes and suffixes". Who'd think that after so many years, there's still something useful to be added to sting methods (and then, that it doesn't have to be as complex as one can devise at full throttle, but much simpler than that).
And I'm not sure if you're trying to insinuate that CPython represents Python the language
That's an old and painful (to some) topic.
and thus needs to not optimize for something other implementations have/can not optimize for, which if you are
As I clarified, I don't say that CPython shouldn't optimize for things. I just tried to argue that there's no clearly defined abstraction (*) for accumulating string buffer, and argued that it could be easily "established". (*) Instead, there're various of practical hacks to implement it, as both 2006's and this thread shows.
suggesting that then I have an uncomfortable conversation I need to have with PyPy 😉. Or if you're saying CPython and Python should be considered separate, then why can't CPython optimize for something it happens to be positioned to optimize for that other implementations can't/haven't?
Yes, I personally think that CPython and Python should be considered separate. E.g. the topic of this RFC shouldn't be considered just from CPython's point of view, but rather from the angle of "Python doesn't seem to define a useful abstraction of (ergonomic) string builder, here's how different Python implementations can acquire it almost for free". -- Best regards, Paul mailto:pmiscml@gmail.com