
Larry Hastings wrote:
Martin v. Löwis wrote:
Let's be specific: when there is at least one long-lived small lazy slice of a large string, and the large string itself would otherwise have been dereferenced and freed, and this small slice is never examined by code outside of stringobject.c, this approach means the large string becomes long-lived too and thus Python consumes more memory overall. In pathological scenarios this memory usage could be characterized as "insane".
True dat. Then again, I could suggest some scenarios where this would save memory (multiple long-lived large slices of a large string), and others where memory use would be a wash (long-lived slices containing the all or almost all of a large string, or any scenario where slices are short-lived). While I think it's clear lazy slices are *faster* on average, its overall effect on memory use in real-world Python is not yet known. Read on.
I wonder - how expensive would it be for the string slice to have a weak reference, and 'normalize' the slice when the big string is collected? Would the overhead of the weak reference swamp the savings?
-- Talin