Steve Holden wrote:
I guess I wasn't clear in my description of the patch; sorry about that.
But it seems to me that the only major issue is the inability to provide
zero-byte terminators with this new representation.
Like "lazy concatenation objects", "lazy slices" render when you call
PyString_AsString() on them. Before rendering, the lazy slice's
ob_sval will be NULL. Afterwards it will point to a proper
zero-terminated string, at which point the object behaves exactly like
any other PyStringObject.
The only function that *might* return a non-terminated char * is
PyString_AsUnterminatedString(). This function is static to
stringobject.c--and I would be shocked if it were ever otherwise.
If external Python extension modules are as well-behaved as the
shipping Python source tree, there simply wouldn't be a problem.
Python source is delightfully consistent about using the macro
PyString_AS_STRING() to get at the creamy char *center of a
PyStringObject *. When code religiously uses that macro (or calls
PyString_AsString() directly), all it needs is a recompile with the
current stringobject.h and it will Just Work.
If there were any reliable way to make sure these objects never got
passed to extension modules then I'd say "go for it".
I genuinely don't know how many external Python extension modules are
well-behaved in this regard. But in case it helps: I just checked PIL,
NumPy, PyWin32, and SWIG, and all of them were well-behaved.
Apart from stringobject.c, there was exactly one spot in the Python
source tree which made assumptions about the structure of
PyStringObjects (Mac/Modules/macos.c). It's in the block starting with
the comment "This is a hack:". Note that this is unfixed in my patch,
so just now all code using that self-avowed "hack" will break.
Am I correct in understanding that changing the Python minor revision
number (2.5 -> 2.6) requires external modules to recompile? (It
certainly does on Windows.) If so, I could mitigate the problem by
renaming ob_sval. That way, code making explicit reference to it would
fail to compile, which I feel is better than silently recompiling