
Hello, On Tue, 31 Mar 2020 04:27:04 +1100 Chris Angelico <rosuav@gmail.com> wrote: []
There's a vast difference between "mutable string" and "string builder". The OP was talking about this kind of thing:
buf = "" for i in range(50000): buf += "foo" print(buf)
And then suggested using a StringIO for that purpose. But if you're going to change your API, just use a list:
buf = [] for i in range(50000): buf.append("foo") buf = "".join(buf) print(buf)
I appreciate expressing it all concisely and clearly. Then let me respond here instead of the very first '"".join() rules!' reply I got. The issue with "".join() is very obvious: ------ import io import sys def strio(): sb = io.StringIO() for i in range(50000): sb.write(u"==%d==" % i) print(sys.getsizeof(sb) + sys.getsizeof(sb.getvalue())) def listjoin(): sb = [] sz = 0 for i in range(50000): v = u"==%d==" % i # All individual strings will be kept in the list and # can't be GCed before teh final join. sz += sys.getsizeof(v) sb.append(v) s = "".join(sb) sz += sys.getsizeof(sb) sz += sys.getsizeof(s) print(sz) strio() listjoin() ------ $ python3.6 memuse.py 439083 3734325 So, it's obvious, but let's formulate it clearly for avoidance of doubt: There's absolutely no need why performing trivial operation of accumulating string content should take about order of magnitude more memory than actually needed for that string content. Don't get me wrong - if you want to spend that much of your memory, then sure, you can. But jumping with that as *the only right solution* whenever somebody mentions "string concatenation" is a bit ... umm, cavalier.
This is going to outperform anything based on StringIO fairly easily,
Since when raw speed is the only criterion for performance? If you say "forever", I'll trust only if you proceed with showing assembly code with SSE and AVX which you wrote to get those last cycles out. Otherwise, being able to complete operations in reasonable amount of memory, not OOM and not being DoSed by trivial means, and finally, serving 8 times more requests in the same amount of memory - are alll quite criteria too. What's interesting, that so far, the discussion almost 1-to-1 parallels discussion in the 2006 thread I linked from the original mail.
So if you really want a drop-in replacement, don't build it around StringIO, build it around list.
class StringBuilder: def __init__(self): self.data = [] def __iadd__(self, s): self.data.append(s) def __str__(self): return "".join(self.data)
But of course! And what's most important, nowhere did I talk what should be inside this class. My whole concern is along 2 lines: 1. This StringBuilder class *could* be an existing io.StringIO. 2. By just adding __iadd__ operator to it. That's it, nothing else. What's inside StringIO class is up to you (dear various Python implementations, their maintainers, and contributors). For example, fans of "".join() surely can have it inside. Actually, it's a known fact that Python2's "StringIO" module (the original home of StringIO class) was implemented exactly like that, so you can go straight back to the future. And again, the need for anything like that might be unclear for CPython-only users. Such users can write a StringBuilder class like above, or repeat the beautiful "".join() trick over and over again. The need for a nice string builder class may occur only from the consideration of the Python-as-a-language lacking a clear and nice abstraction for it, and from thinking how to add such an abstraction in a performant way (of which criteria are different) in as many implementation as possible, in as easy as possible way. (At least that's my path to it, I'm not sure if a different thought process might lead to it too.) -- Best regards, Paul mailto:pmiscml@gmail.com