[Python-3000] Poll: Lazy Unicode Strings For Py3k

Josiah Carlson jcarlson at uci.edu
Wed Jan 31 19:58:13 CET 2007


Larry Hastings <larry at hastings.org> wrote:
> Lazy concatenation changes the behavior of Python in three subtle
> ways.  First, it adds the same two changes in the C API behavior
> that "lazy concatenation" does: requiring use of the accessor
> macro/function, and stipulating that these can now fail.

I presume the second word in the above paragraph shuold read "slicing".


> Lazy slices also add a third wrinkle, a change to Python's
> memory allocation behavior: slicing a string means the
> slice holds a reference to the original string.  If the slice

Perhaps I missed something about the concatenation implementation, but
in order to prevent the rendering of lazy concatenation objects,
shouldn't you need to keep a reference and pointer to the left and right
strings/concatenation objects?  This isn't the same as a (small) slice
holding onto a reference to a (big) string, but there is still an object
lifetime consideration.


> I asked on Python-Dev about this.  The consensus I got was:
>  * In CPython locals are freed in the order they were added
>    to the dictionary, and
>  * this behavior is not a defined part of Python and should
>    not be relied on.

There isn't a locals dictionary, there is a locals array.  locals()
creates a mapping of names -> values based on the content of
func_code.co_varnames and the locals array.   The order of deletion is
that which the locals array is defined, which is based on when the
*names* are seen by the parser and compiler, which is (I believe), the
order listed in func_code.co_varnames .


> So I did what any loser hacker would do: I reversed the order.
> Now local variables are freed in the reverse order of their
> being added to the locals dictionary.  Obviously this does not
> guarantee that all local slices will be freed before their local
> original string is.  But I *suspect* that it's better for V2 lazy
> slices than the other order, and I've been assured that nobody
> can rely on the order here, so it should be a safe (cheap) hack.
> This change certainly doesn't seem to break anything.

The performance of that change is negated when confronted with the
following:

        def isNotFoo(fileHandle):
                x = None
                a = unicode(fileHandle.readline())
                x = a.strip()
                return x != u"foo"

I don't suspect that variants of the above is common, but it is a gotcha
for your change.


> THE POLLS
> 
> Finally, the polls.
> 
> #1: Should Py3k incorporate the "lazy concatenation" patch?
> 
> #2: Should Py3k incorporate the original "lazy slices" patch?
> 
> #3: Should Py3k incorporate the "V2 lazy slices" patch?
> 
> #4: Assuming you are +0 or better on #3, is it okay to change the
>     frame object such that locals are released in the reverse order
>     of their being added to the dictionary?
> 
> #5: Should Py3k incorporate the "local freelist retooling" patch?


You are probably not going to be surprised about this.

#1, #2, #3; -1

For me, the killer is the added complexity that the patches introduce. 
In optimizing my own code (changing algorithms, data structures, etc.),
there is a certain level of algorithm and structural complexity that I
am willing to endure in order to improve performance.  Beyond that level
I usually say "it's not worth it".

Are the performance improvements you are proposing worth the added
complexity of the Unicode object?  I suspect not.  But I'm tossing my
hat in the "No" ring because I've become fairly conservative when it
comes to changes in Python.

If people go with a selection that includes 3, I don't see any reason to
not reverse the order of the freeing.  The gotcha I provided isn't
terribly convincing, even to me.

I believe improvements to the freelist(s), assuming that they are a
relatively minor, would also be fine.


 - Josiah

P.S. One of the reasons why I have been pushing for a wrapper, is
primarily because I believe that the added complexity to the *base type*
is too much, while a wrapper object would be free to do just about
anything (with a sufficiently restricted meaning of 'anything').  Yes,
users would need to explicitly wrap unicode objects.  It's not ideal,
and would limit their use, but for those who have particular needs or
desires (and/or would want to use them with Python 2.x), it could be
helpful.

P.P.S. I'm not certain, but I believe the V2 lazy slicing could
be cleaned up with the use of weak references.



More information about the Python-3000 mailing list