
On Tue, Mar 31, 2020 at 12:21 PM Paul Sokolovsky <pmiscml@gmail.com> wrote:
Christopher Barker <pythonchb@gmail.com> wrote: For avoidance of doubt: nothing in my RFC has anything to do, or implies, "a mutable string type".
I said "there are some use cases for a mutable string type" I did not say that's what was asked for in this thread. So why did I say that? because:
A well-know pattern of string builder, yes.
As I read this suggestion, it starred with something like: * lots of people use the a "pattern of string building", using str += another_string to build up strings. * That is not an efficient pattern, and is considered an anti-pattern, even in cPython, where is has been cleverly optimized. I think everyone on this thread would agree with the above. * The "official recommended solution" is another pattern: build up in the list, and then join it. You are suggesting that it would nice if there were an efficient implementation of string building that followed the original anti-pattern's syntax. After all, if folks want to make a string, then using familiar string syntax would be nice and natural. You've pointed out that StringIO already provides an efficient implementation of string building (which could be made even more efficient, if one wanted to write that code) . And that if it grew an __iadd__ method, it would then match the pattern that you want it to match, and allow folks to improve their code with less change than going to the list.append then join method. All good. But what struck me is that in the end, this is perhaps a more friendly than the list-based method, but it's still a real shift in paradigm: I think people use str +=str not because they are thinking "I need a string builder", but because they are thinking: I need a "string". That is your choice of variable names: buf = "" for i in range(50000): buf += "foo" print(buf) is not what most folks would use, because they aren't thinking "I need a buffer in which to put a bunch of strings", they are thinking: "I need to make this big string", so would more likely write: message = "The start of the message" for i in something: buf += "some more message" do_something_with_the_message(message) which, yes, is almost exactly the same as your example, but with a different intent -- I start with a string and make it bigger, not "I make a buffer in which to build a string, and then put things in it, then get the resulting string out of the buffer. I teach a lot of beginners, so yes, I do see this code pattern a fair bit. The difference in intent means that folks are not likely to go looking for a "buffer" or "string builder" anyway. So that suggested to me that a mutable string type would completely satisfy your use case, but be more natural to folks used to strings: message = MutableString("The start of the message") for i in something: buf += "some more message" do_something_with_the_message(message) And you could do other nifty things with it, like all the string methods, without a lot of wasteful reallocating, particularly for methods that don't change the length of the string. (Though Unicode does make this a challenge!) (and yes, I know, that the "wasteful reallocating" is probably hardly ever, if ever, a bottleneck) In short: a mutable string would satisfy the requirements of a "string builder", and more. Anyway, as I said in my previous message, the fact that a Mutable string hasn't gained any traction tells us something: it really isn't that important. And I mentioned a similar effort I made to make a growable numpy array, and, well, it turned out not to be worth it either. However if we're all wrong, and there would be a demand for such a "string builder", then why not write one (could be a wrapper around StringIO if you want), and put it on PyPi, or even just for own lib, and see how it catches on. Have you done that for your own code and found you like it enough to really want to push this forward? BTW: I timed += vs StringIO, vs list + join, and found (like you did) that they are all about the same speed, under cPython 3.7. But I had a thought -- might string interning be affecting the performance? particularly for the list method: In [43]: def list_join(): ...: buf = [] ...: for i in range(10000): ...: buf.append("foo") ...: return "".join(buf) note that that is only making one string "foo", and reusing it in all items in the list. In the common case, you wouldn't get that help. OK, tested it, no it doesn't really make a difference. If you replace "foo" (which gets interned) with "foo "[:3] (which doesn't), they all take longer, but still all about the same. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython