[Python-3000] Poll: Lazy Unicode Strings For Py3k
Jim Jewett
jimjjewett at gmail.com
Wed Jan 31 21:49:50 CET 2007
On 1/31/07, Larry Hastings <larry at hastings.org> wrote:
> Lazy concatenation changes the behavior of the Python C API in two
> subtle ways:
> 1) All C API users asking for the value of a string *must* use
> the macro PyUnicode_AS_UNICODE() or the function
> PyUnicode_AsUnicode(). It is no longer permissable to
> directly access the object's "str" member.
> 2) It is now possible for PyUnicode_AS_UNICODE() and
> PyUnicode_AsUnicode() to *fail*, returning NULL under
> low-memory conditions.
(Note that it might be possible to wrap the null-check inside an
access macro in a manner similar to Brett's object capability API.)
I believe the str API should make these changes **even if these
patches are not applied.**
The code-point representation issues alone are contentious enough that
I don't think it is possible to satisfy everyone with a single
implementation.
Even for the default implementation, we might want to change some of
the tradeoffs once unicode-for-everything is in normal use.
Given that, any of your three options (as well as the no-change) are
legitimate implementations, and the question is which to use by
default in 3.0, knowing that it could change with a minor (but not
bugfix) version. I would be inclined to go with lazy slicing #1, but
the particular choice is less important than the API change.
-jJ
More information about the Python-3000
mailing list