[Python-Dev] Can't compile _tkinter.c with Redhat 9 (post-SF#719880)

Jeff Hobbs jeffh@ActiveState.com
Sun, 15 Jun 2003 16:29:11 -0700


My ears were ringing for this one, so I thought I'd add some comments.

Martin v. Lowis wrote:
> >>Ask Redhat why they chose to hack Tcl to better accomodate Python.
> >
> > I take it 'better' is ironically here? :)
>
> Well, they decided to built Python 2.2 in UCS-4 mode, which is,
> in general, a good thing, except that it doesn't work. In particular,
> sre and _tkinter don't work well; _tkinter does not work at all.
> Instead of fixing _tkinter (which I now did for Python 2.3), they
> chose to hack Tcl instead, to extend it to UCS-4 mode.

Can someone explain to me why moving to UCS-4 is a good thing?  What
RedHat did is a Bad Thing, in completely screwing with compatability
(with no warning).  See below.

> Now I have to deal with standard Tcl, which is UCS-2, and can support
> both UCS-2 and UCS-4 Python _tkinter with that (2.2 would mandate
> UCS-2 Python for _tkinter). I also have to support Redhat Tcl, which
> is UCS-4. I chose to only support this in combination with UCS-4
> Python. If anybody wants the combination Redhat Tcl + UCS-2 Python,
> feel free to develop a patch.
>
> > Redhat does have bugfix updates, I will look if they know something
> > about this tomorrow (IIRC they have a buglist).
>
> It's not a bug. They *really* made this change to support their own
> build of Python. Unfortunately, they have thereby tied Python to UCS-4
> on Redhat 9. Their change is a hack, though, as Tcl does not really
> support UCS-4 - it merely compiles now, but don't dare to use non-BMP
> characters.

What UCS-4 support are you looking for that doesn't seem to exist?
While Tcl is agnostic about non-BMP chars (all 2 of them ... ha ha),
it does have correct UCS-4 support (not completely though with how
RedHat patched it).  This has been discussed before briefly here:

https://sourceforge.net/tracker/?func=detail&aid=578030&group_id=10894&atid=
110894

A Tcl_UniChar is 32-bits and TCL_UTF_MAX is 6 (normally it is 3),
which represents the number of utf-8 bytes that are valid in sequence.
Strings going through Tcl are handled as either utf-8 char*s or utf-32
(aka UCS-4) Tcl_UniChar*s.

I do realize that correct handling on non-BMP characters requires
some more work, but that is orthogonal to this issue.  While UCS-4
opens up more code points to allow non-BMP chars, there are very few
in that range at this point.  The bigger issue is that in changing
the basic Tcl_UniChar size, you break the binary compatability rules.
RH9 is the only version/distro to use 32-bit Tcl_UniChar, which
breaks compatability with extensions build on other versions/distros.

This is particularly bad for Tcl, which has the stubs mechanism that
allows for (promises really) upwards binary compatability for an
extension build against 8.x with 8.y where y >= x.  RH breaks that,
and that is surprising users.

Also, while Tcl can build and works just find with 32-bit Tcl_UniChar,
but I don't recall testing Tk when I tested Tcl.  Checking on a
rebuild now, it does appear that Tk operates just fine.  However, it
does consume a lot more memory.

I finally found the source RPMs for Tcl that RH9 uses and checked
out there patch.  It's not even correct.  You have to modify
tcl/generic/regcustom.h as well to account for Tcl_UniChar being
32-bits.  IOW, it's very annoying to me that someone at RedHat went
blundering around in the dark making these modifications when it is
fairly easy to find and communicate with the core developers on the
what, how and why of doing things correctly.

So I understand the need to deal with varying UCS-2 and UCS-4 Tcl
builds, but I can't imagine needing UCS-4 Tcl with UCS-2 Python
because I don't think anyone is crazy enough to do that combo.

BTW, cc me on any follow-up, as I am not on the list.

  Jeff Hobbs                     The Tcl Guy
  Senior Developer               http://www.ActiveState.com/
      Tcl Support and Productivity Solutions