My ears were ringing for this one, so I thought I'd add some comments.
Martin v. Lowis wrote:
Ask Redhat why they chose to hack Tcl to better accomodate Python.
I take it 'better' is ironically here? :)
Well, they decided to built Python 2.2 in UCS-4 mode, which is, in general, a good thing, except that it doesn't work. In particular, sre and _tkinter don't work well; _tkinter does not work at all. Instead of fixing _tkinter (which I now did for Python 2.3), they chose to hack Tcl instead, to extend it to UCS-4 mode.
Can someone explain to me why moving to UCS-4 is a good thing? What RedHat did is a Bad Thing, in completely screwing with compatability (with no warning). See below.
Now I have to deal with standard Tcl, which is UCS-2, and can support both UCS-2 and UCS-4 Python _tkinter with that (2.2 would mandate UCS-2 Python for _tkinter). I also have to support Redhat Tcl, which is UCS-4. I chose to only support this in combination with UCS-4 Python. If anybody wants the combination Redhat Tcl + UCS-2 Python, feel free to develop a patch.
Redhat does have bugfix updates, I will look if they know something about this tomorrow (IIRC they have a buglist).
It's not a bug. They *really* made this change to support their own build of Python. Unfortunately, they have thereby tied Python to UCS-4 on Redhat 9. Their change is a hack, though, as Tcl does not really support UCS-4 - it merely compiles now, but don't dare to use non-BMP characters.
What UCS-4 support are you looking for that doesn't seem to exist? While Tcl is agnostic about non-BMP chars (all 2 of them ... ha ha), it does have correct UCS-4 support (not completely though with how RedHat patched it). This has been discussed before briefly here:
A Tcl_UniChar is 32-bits and TCL_UTF_MAX is 6 (normally it is 3), which represents the number of utf-8 bytes that are valid in sequence. Strings going through Tcl are handled as either utf-8 char*s or utf-32 (aka UCS-4) Tcl_UniChar*s.
I do realize that correct handling on non-BMP characters requires some more work, but that is orthogonal to this issue. While UCS-4 opens up more code points to allow non-BMP chars, there are very few in that range at this point. The bigger issue is that in changing the basic Tcl_UniChar size, you break the binary compatability rules. RH9 is the only version/distro to use 32-bit Tcl_UniChar, which breaks compatability with extensions build on other versions/distros.
This is particularly bad for Tcl, which has the stubs mechanism that allows for (promises really) upwards binary compatability for an extension build against 8.x with 8.y where y >= x. RH breaks that, and that is surprising users.
Also, while Tcl can build and works just find with 32-bit Tcl_UniChar, but I don't recall testing Tk when I tested Tcl. Checking on a rebuild now, it does appear that Tk operates just fine. However, it does consume a lot more memory.
I finally found the source RPMs for Tcl that RH9 uses and checked out there patch. It's not even correct. You have to modify tcl/generic/regcustom.h as well to account for Tcl_UniChar being 32-bits. IOW, it's very annoying to me that someone at RedHat went blundering around in the dark making these modifications when it is fairly easy to find and communicate with the core developers on the what, how and why of doing things correctly.
So I understand the need to deal with varying UCS-2 and UCS-4 Tcl builds, but I can't imagine needing UCS-4 Tcl with UCS-2 Python because I don't think anyone is crazy enough to do that combo.
BTW, cc me on any follow-up, as I am not on the list.
Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/ Tcl Support and Productivity Solutions