Re: [pypy-dev] [Python-ideas] Explicitly defining a string buffer object (aka StringIO += operator)

Hello, On Mon, 30 Mar 2020 09:58:32 -0700 Brett Cannon <brett@python.org> wrote:
Everyone definitely doesn't have to agree with that characterization. Nor there's strong need to be offended that it's "unfair". After all, it's just somebody's opinion. Roughly speaking, the need to be upset by the "mis-" prefix is about the same as need to be upset by "bad" in some random blog post, e.g. https://snarky.ca/my-impressions-of-elm/ I'm also sure that people familiar with implementation details would understand why that "mis-" prefix, but let me be explicit otherwise: a string is one of the fundamental types in many languages, including Python. And trying to make it too many things at once has its overheads. Roughly speaking, to support efficient appending, one need to be ready to over-allocate string storage, and maintain bookkeeping for this. Another known optimization CPython does is for stuff like "s = s[off:]", which requires maintaining another "offset" pointer. Even with this simplistic consideration, internal structure of "str" would be about the same as "io.StringIO" (which also needs to over-allocate and maintain "current offset" pointer). But why, if there's io.StringIO in the first place?
Nowhere did I argue against applying that optimization in CPython. Surely, in general, the more optimizations, the better. I just stated the fact that of 8 (well, 11, 11!) Python'ish implementations surveyed, only 1 implemented it. And what went implied, is that even under ideal conditions that other implementations say "we have resources to implement and maintain that optimization" (we still talking about "str +=" optimization), then at least for some projects, it would be against their interests. E.g. MicroPython, Pycopy, Snek optimize for memory usage, TinyPy for simplicity of implementation. "Too-complex basic types" are also a known problem for JITs (which become less performant due to need to handle multiple cases of the same primitive type and much harder to develop and debug). At the same time, ergonomics of "str +=" is very good (heck, that's why people use it). So, I was looking for the simplest possible change which would allow for the largest part of that ergonomics in an object type more suitable for content accumulation *across* different Python'ish implementations. I have to admit that I was inspired to write down this RFC by PEP 616 "String methods to remove prefixes and suffixes". Who'd think that after so many years, there's still something useful to be added to sting methods (and then, that it doesn't have to be as complex as one can devise at full throttle, but much simpler than that).
And I'm not sure if you're trying to insinuate that CPython represents Python the language
That's an old and painful (to some) topic.
and thus needs to not optimize for something other implementations have/can not optimize for, which if you are
As I clarified, I don't say that CPython shouldn't optimize for things. I just tried to argue that there's no clearly defined abstraction (*) for accumulating string buffer, and argued that it could be easily "established". (*) Instead, there're various of practical hacks to implement it, as both 2006's and this thread shows.
Yes, I personally think that CPython and Python should be considered separate. E.g. the topic of this RFC shouldn't be considered just from CPython's point of view, but rather from the angle of "Python doesn't seem to define a useful abstraction of (ergonomic) string builder, here's how different Python implementations can acquire it almost for free". -- Best regards, Paul mailto:pmiscml@gmail.com
participants (1)
-
Paul Sokolovsky