[Python-Dev] The "lazy strings" patch

Mon Oct 23 05:56:31 CEST 2006

Martin v. Löwis wrote:
> It's not clear to me what you want to achieve with these patches,
> in particular, whether you want to see them integrated into Python or
> not.
>   
I would be thrilled if they were, but it seems less likely with every 
passing day.  If you have some advice on how I might increase the 
patch's chances I would be all ears.

It was/is my understanding that the early days of a new major revision 
was the most judicious time to introduce big changes.  If I had offered 
these patches six months ago for 2.5, they would have had zero chance of 
acceptance.  But 2.6 is in its infancy, and so I assumed now was the 
time to discuss sea-change patches like this.

Anyway, it was my intent to post the patch and see what happened.  Being 
a first-timer at this, and not having even read the core development 
mailing lists for very long, I had no idea what to expect.  Though I 
genuinely didn't expect it to be this brusque.

> I think this specific approach will find strong resistance.
I'd say the "lazy strings" patch is really two approaches, "lazy 
concatenation" and "lazy slices".  You are right, though, *both* have 
"found strong resistance".

> Most recently, it was discussed under the name "string view" on the Py3k list, see
>   http://mail.python.org/pipermail/python-3000/2006-August/003282.html
> Traditionally, the biggest objection is that even small strings may
> consume insane amounts of memory.
>   
Let's be specific: when there is at least one long-lived small lazy 
slice of a large string, and the large string itself would otherwise 
have been dereferenced and freed, and this small slice is never examined 
by code outside of stringobject.c, this approach means the large string 
becomes long-lived too and thus Python consumes more memory overall.  In 
pathological scenarios this memory usage could be characterized as "insane".

True dat.  Then again, I could suggest some scenarios where this would 
save memory (multiple long-lived large slices of a large string), and 
others where memory use would be a wash (long-lived slices containing 
the all or almost all of a large string, or any scenario where slices 
are short-lived).  While I think it's clear lazy slices are *faster* on 
average, its overall effect on memory use in real-world Python is not 
yet known.  Read on.

>> I bet this generally reduces overall memory usage for slices too.
>>     
> Channeling Guido: what real-world applications did you study with
> this patch to make such a claim?
>   
I didn't; I don't have any.  I must admit to being only a small-scale 
Python user.  Memory use remains about the same in pybench, the biggest 
Python app I have handy.  But, then, it was pretty clearly speculation, 
not a claim.  Yes, I *think* it'd use less memory overall.  But I 
wouldn't *claim* anything yet.

The "stringview" discussion you cite was largely speculation, and as I 
recall there were users in both camps ("it'll use more memory overall" 
vs "no it won't").  And, while I saw a test case with microbenchmarks, 
and a "proof-of-concept" where a stringview was a separate object from a 
string, I didn't see any real-word applications tested with this approach.

Rather than start in on speculation about it, I have followed that old 
maxim of "show me the code".  I've produced actual code that works with 
real strings in Python.  I see this as an opportunity for Pythonistas to 
determine the facts for themselves.  Now folks can try the patch with 
these real-world applications you cite and find out how it really 
behaves.  (Although I realize the Python community is under no 
obligation to do so.)

If experimentation is the best thing here, I'd be happy to revise the 
patch to facilitate it.  For instance, I could add command-line 
arguments letting you tweak the run-time behavior of the patch, like 
changing the minimum size of a lazy slice.  Perhaps add code so there's 
a tweakable minimum size of a lazy concatenation too.  Or a tweakable 
minimum *ratio* necessary for a lazy slice.  I'm open to suggestions.

Cheers,

/larry/