[Python-3000] Poll: Lazy Unicode Strings For Py3k

Jim Jewett jimjjewett at gmail.com
Wed Jan 31 21:49:50 CET 2007


On 1/31/07, Larry Hastings <larry at hastings.org> wrote:
> Lazy concatenation changes the behavior of the Python C API in two
> subtle ways:

> 1) All C API users asking for the value of a string *must* use
>    the macro PyUnicode_AS_UNICODE() or the function
>    PyUnicode_AsUnicode().  It is no longer permissable to
>    directly access the object's "str" member.
> 2) It is now possible for PyUnicode_AS_UNICODE() and
>    PyUnicode_AsUnicode() to *fail*,  returning NULL under
>    low-memory conditions.

(Note that it might be possible to wrap the null-check inside an
access macro in a manner similar to Brett's object capability API.)

I believe the str API should make these changes **even if these
patches are not applied.**

The code-point representation issues alone are contentious enough that
I don't think it is possible to satisfy everyone with a single
implementation.

Even for the default implementation, we might want to change some of
the tradeoffs once unicode-for-everything is in normal use.

Given that, any of your three options (as well as the no-change) are
legitimate implementations, and the question is which to use by
default in 3.0, knowing that it could change with a minor (but not
bugfix) version.  I would be inclined to go with lazy slicing #1, but
the particular choice is less important than the API change.

-jJ


More information about the Python-3000 mailing list