[Python-Dev] The "lazy strings" patch

Mon Oct 23 07:00:30 CEST 2006

Larry Hastings <larry at hastings.org> wrote:
> It was/is my understanding that the early days of a new major revision 
> was the most judicious time to introduce big changes.  If I had offered 
> these patches six months ago for 2.5, they would have had zero chance of 
> acceptance.  But 2.6 is in its infancy, and so I assumed now was the 
> time to discuss sea-change patches like this.

It would be a radical change for Python 2.6, and really the 2.x series,
likely requiring nontrivial changes to extension modules that deal with
strings, and the assumptions about strings that have held for over a
decade.  I think 2.6 as an option is a non-starter.  Think Py3k, and
really, think bytes and unicode.

> The "stringview" discussion you cite was largely speculation, and as I 
> recall there were users in both camps ("it'll use more memory overall" 
> vs "no it won't").  And, while I saw a test case with microbenchmarks, 
> and a "proof-of-concept" where a stringview was a separate object from a 
> string, I didn't see any real-word applications tested with this approach.
> 
> Rather than start in on speculation about it, I have followed that old 
> maxim of "show me the code".  I've produced actual code that works with 
> real strings in Python.  I see this as an opportunity for Pythonistas to 
> determine the facts for themselves.  Now folks can try the patch with 
> these real-world applications you cite and find out how it really 
> behaves.  (Although I realize the Python community is under no 
> obligation to do so.)

One of the big concerns brought up in the stringview discussion was that
of users expecting one thing and getting another.  Slicing a larger
string producing a 'view', which then keeps the larger string alive,
would be a surprise.  By making it a separate object that just *knows*
about strings (or really, anything that offers a buffer interface), I
was able to make an object that was 1) flexible, 2) usable in any Python,
3) doesn't change the core assumptions about Python, 4) is expandable to
beyond just *strings*.  Reason #4 was my primary reason for writing it,
because str disappears in Py3k, which is closer to happening than most
of us realize.

> If experimentation is the best thing here, I'd be happy to revise the 
> patch to facilitate it.  For instance, I could add command-line 
> arguments letting you tweak the run-time behavior of the patch, like 
> changing the minimum size of a lazy slice.  Perhaps add code so there's 
> a tweakable minimum size of a lazy concatenation too.  Or a tweakable 
> minimum *ratio* necessary for a lazy slice.  I'm open to suggestions.

I believe that would be a waste of time.  The odds of it making it into
Python 2.x without significant core developer support are pretty close
to None, which in Python 2.x is less than 0.  I've been down that road,
nothing good lies that way.

Want my advice?  Aim for Py3k text as your primary target, but as a
wrapper, not as the core type (I put the odds at somewhere around 0 for
such a core type change).  If you are good, and want to make guys like
me happy, you could even make it support the buffer interface for
non-text (bytes, array, mmap, etc.), unifying (via wrapper) the behavior
of bytes and text.

 - Josiah