
As others have pointed out, the OP started in a bit of an oblique way, but it maybe come down to this: There are some use-cases for a mutable string type. And one could certainly write one. presto: here is one: https://github.com/Daniil-Kost/mutable_strings Which looks to me to be more a toy than anything, but maybe the author is seriously using it... (it does look like it has a bug indexing if there are non-ascii) And yet, as far as I know, there has never been one that was carefully written and optimized, which would be a bit of a trick, because of how Python strings handle Unicode. (it would have been a lot easier with Python2 :-) ) So why not? 1) As pointed out, high performance strings are key to a lot of coding, so Python's str is very baked-in to a LOT of code, and can't be duck-typed. I know that pretty much the only time I ever type check (as apposed to simple duck typing EAFTP) is for str. So if one were to make a mutable string type, you'd have to convert it to a string a lot in order to use most other libraries. That being said, one could write a mutable string that mirrored' the cPython string types as much as possible, and it could be pretty efficient, even for making regular strings out of it. 2) Maybe it's really not that useful. Other than building up a long string with a bunch of small ones (which can be done fine with .join()) , I'm not sure I've had much of a use case -- it would buy you a tiny bit of performance for, say, altering strings in ways that don't change their length, but I doubt there's many (if any) applications that would see any meaningful benefit from that. So I'd say it hasn't been done because (1) it's a lot of work and (2) it would be a bit of a pain to use, and not gain much at all. A kind-of-related anecdote: numpy arrays are mutable, but you can not change their length in place. So, similar with strings, if you want to build up an array with a lot of little pieces, then the best way is to put all the pieces in a list, and then make an array out of it when you are done. I had a need to do that fairly often (reading data from files of unknown size) so I actually took the time to write an array that could be extended. Turns out that: 1) it really wasn't much faster (than using a list) in the usual use-cases anyway :-) 2) it did save memory -- which only mattered for monster arrays, and I'd likely need to do something smarter anyway in those cases. I even took some time to write a Cython-optimized version, which only helped a little. I offered it up to the numpy community. But in the end: no one expressed much interest. And I haven't used it myself for anything in a long while. Moral of the story: not much point in a special class to do something that can already be done almost as well with the builtins. -CHB On Mon, Mar 30, 2020 at 2:06 PM Paul Sokolovsky <pmiscml@gmail.com> wrote:
Hello,
On Tue, 31 Mar 2020 07:40:01 +1100 Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Mar 31, 2020 at 7:04 AM Paul Sokolovsky <pmiscml@gmail.com> wrote:
for i in range(50000): v = u"==%d==" % i # All individual strings will be kept in the list and # can't be GCed before teh final join. sz += sys.getsizeof(v) sb.append(v) s = "".join(sb) sz += sys.getsizeof(sb) sz += sys.getsizeof(s) print(sz)
... about order of magnitude more memory ...
I suspect you may be multiply-counting some of your usage here. Rather than this, it would be more reliable to use the resident set size (on platforms where you can query that).
I may humbly suggest a different process too: get any hardware board with MicroPython and see how much data you can collect in a StringIO and in a list of strings. Well, you actually don't need a dedicated hardware, just get a Linux or Windows version and run it with a specific heap size using a -X heapsize= switch, e.g. -X heapsize=100K.
Please don't stop there, we talk multiple implementations, try it on CPython too. There must be a similar option there (because how otherwise you can perform any memory-related testing!), I just forgot which.
The results should be very apparent, and only forgotten option may obfuscate it.
[]
-- Best regards, Paul mailto:pmiscml@gmail.com _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZWKHUV... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython