
On Fri, Apr 3, 2020 at 9:20 AM Paul Sokolovsky <pmiscml@gmail.com> wrote:
But not exactly. Let me humbly explain what's really a cost. It's looking at PyObject_HEAD https://swenson.github.io/python-xr/Include/object.h.html#line-78 (damn, that's Python2 source, stupid google), and seeing that it's at least:
Py_ssize_t ob_refcnt; \ struct _typeobject *ob_type;
That's 2 word-sized fields, 16 bytes on 64-bit machine. You can dig further and further, and understand, how much memory it takes to store so-and-so kind of structure (and how it could be done differently).
That's fair, but the PyObject* header isn't the only cost. The actual data for a Python string isn't stored in the structure. How do you know how much memory is being consumed by that? Are you 100% certain that sys.getsizeof() is measuring that? It appears from the source code that it *probably* is (str.__sizeof__ is defined in unicodeobject.c), but it counts, for instance, the length of the UTF-8 representation (if present) plus one for null termination, and that's quite possibly not the actual allocated size, due to overhead (and possible alignment) in PyObject_REALLOC. So you have to either try to delve into the source and find every single byte of overhead or wastage.... or you just allocate a huge bunch of strings and then ask your OS how much space you're consuming. Yes, the OS is going to have very coarse granularity, but when you're trying to figure out the RAM requirements of large-string concatenations, you're looking for a large difference anyway.
Now a couple of words about RSS. That's R there for a reason, you should wonder what's if it's not "R". And modern OSes are very modern and nobody knows what they do with virtual memory, or at least they can't fix bugs when something should be "R", but actually "V" - for decades: https://bugzilla.kernel.org/show_bug.cgi?id=12309 (damn, now self-isolated from spam).
I hope, the idea is clear: RSS is largely outside of your control, but bytes you allocate in your source are (or should be).
Technically yes, it's under your control. In practice, I'm not so sure. ChrisA