[Python-Dev] The C API and wide unicode support

11 Jul 2002 13:01:46 +0100

"M.-A. Lemburg" <mal@lemburg.com> writes:

> > Prediction: this is going to cause pain.  For instance, if this user
> > decides that he wants to upgrade to 2.2.1, he might download Sean's
> > RPMs from python.org which are narrow unicode builds -- and then his
> > extensions will break.  The problem here is that the kind of users
> > this is going to trouble are exactly the users who will not know
> > what's going on.
> 
> It's a pain, yes, but still better than having seg faults
> due to memory corruption afterwords.

Probably true.  At least the tracebacks make the problem obvious.

> > We can't prevent this sort of thing totally, but I think it should be
> > possible to carry out simple unicode manipulations (like this example
> > of returning a buffer) without incurring this kind of binary
> > compatibility worry.  Maybe a "safe" api, plastered with warning signs
> > in the docs about poking into the internal structure of the objects.
> 
> Perhaps we need an additional abstract API PyObject_UnicodeEx()
> which provides a way to additionally define the encoding to assume
> for decoding string objects ? (PyObject_Unicode() always assumes
> the default encoding)

That would be nice, yes.  Beats digging "unicode" out of
__builtin__...

> > [*] actually, I think pygame might break with a wide unicode build.
> 
> Why's that ?

Oh, the obvious thing: assuming sizeof(Py_UNICODE) == 2; or rather
assuming that Python's idea of what a unicode buffer is is the same as
SDL's idea (why I can't find written down anywhere, but I assume it's
the same kind of UCS-2 thing narrow builds use).

So, I retract my complaint, and propose to write some docs on the
subject.

Cheers,
M.

-- 
  Two things I learned for sure during a particularly intense acid
  trip in my own lost youth: (1) everything is a trivial special case
  of something else; and, (2) death is a bunch of blue spheres.
                                             -- Tim Peters, 1 May 1998