
On Tue, Mar 31, 2020 at 7:04 AM Paul Sokolovsky <pmiscml@gmail.com> wrote:
for i in range(50000): v = u"==%d==" % i # All individual strings will be kept in the list and # can't be GCed before teh final join. sz += sys.getsizeof(v) sb.append(v) s = "".join(sb) sz += sys.getsizeof(sb) sz += sys.getsizeof(s) print(sz)
... about order of magnitude more memory ...
I suspect you may be multiply-counting some of your usage here. Rather than this, it would be more reliable to use the resident set size (on platforms where you can query that). if "strio" in sys.argv: strio() else: listjoin() print("Max RSS:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss) Based on that, I find that it's at worst a 4:1 difference. Plus, I couldn't see any material difference - the numbers were within half a percent, basically just noise - until I upped your loop counter to 400,000, nearly ten times as much as you were doing. (At that point it became a 2:1 difference. The 4:1 didn't show up until a lot later.) So you have to be working with a *ridiculous* number of strings before there's anything to even consider. And even then, it's only notable if the individual strings are short AND all unique. Increasing the length of the strings basically made it a wash. Consider: for i in range(1000000): sb.write(u"==%d==" % i + "*"*1024) Max RSS: 2028060 for i in range(1000000): v = u"==%d==" % i + "*"*1024 Max RSS: 2104204 So at this point, the string join is slightly faster and takes slightly more memory - within 20% on the time and within 5% on the memory. ChrisA