[Python-Dev] Can't compile _tkinter.c with Redhat 9 (post-SF#719880)

Jeff Hobbs jeff@hobbs.org
Mon, 16 Jun 2003 09:40:26 -0700


> From: martin@v.loewis.de
> Jeff Hobbs writes:
> 
> > Can someone explain to me why moving to UCS-4 is a good thing?  
> 
> Because it simplifies processing of non-BMP characters, as it restores
> the property that you get one Unicode character per string index.

Right, fair enough, that's all well understood - when you have to
deal with characters between U+10000 and U+10FFFF.  It was only
recently that such characters existed in more than a sprinkling.

> > A Tcl_UniChar is 32-bits and TCL_UTF_MAX is 6 (normally it is 3),
> > which represents the number of utf-8 bytes that are valid in sequence.
> 
> Is that current code, or future code? How can I select a UCS-4 build
> during configuration? In what way is the supported mechanism different
> from the one that Redhat uses?

There is no "supported" UCS-4 mode for Tcl.  You have to hand-twiddle
the sources, knowing where to poke.  I can make the changes for 8.5
that allow for an easy configuration option to compile in UCS-4 mode.
I suppose I could also back-port it to 8.4.4.  That won't address the
fact that we've never validated non-BMP support.

> I couldn't find definitive numbers on distribution over planes, but I
> found the following numbers:
> - Unicode 3.0 has 49194 assigned characters
>   (http://www.unicode.org/versions/Unicode3.0.html)
> - Unicode 4.0 has 96248 graphic characters
>   (http://www.unicode.org/versions/Unicode4.0.0/)

Right, and Unicode 4.0 is fresh out of diapers.  You can't even get
the regular code charts yet, you have to view the 4.0 beta ones.  With
4.0 the non-BMP finally gets a notable amount of characters, but they
are fairly weird ones that I'd be surprised to find a public font for.
You can see them at:
	http://www.unicode.org/charts/u40-beta.html
They are the Linear B Syllabary on down.

> > The bigger issue is that in changing the basic Tcl_UniChar size, you
> > break the binary compatability rules.  RH9 is the only
> > version/distro to use 32-bit Tcl_UniChar, which breaks compatability
> > with extensions build on other versions/distros.
> 
> Indeed. Python has added explicit mechanisms to detect such breakage,
> by renaming all API functions depending on the width of a Unicode
> character. That, atleast, allows to detect the breakage at import
> time (missing symbols).

Tcl could do this, but we were very much taken by surprise that it
was pushed to use UCS-4 at all.

> > Checking on a rebuild now, it does appear that Tk operates just
> > fine.  However, it does consume a lot more memory.
> 
> When I tested it, I found that it would break very easily. I was using
> the Redhat procedure, though, so I might have made something wrong.

Can you feed me some sample scripts offline to test with?

> > I finally found the source RPMs for Tcl that RH9 uses and checked
> > out there patch.  It's not even correct.  You have to modify
> > tcl/generic/regcustom.h as well to account for Tcl_UniChar being
> > 32-bits.  
> 
> What is the specific change that one has to make? "You have to edit
> multiple files to activate a feature" is a strange way of supporting
> it...

Ha ha ... well, I did say it was never properly supported.  That
noone bothered to ask how to do it correctly when that was clear is
not a good thing.  What you have to do is modify generic/tcl.h to
set TCL_UTF_MAX to 6, typedef Tcl_UniChar as unsigned int (or
wchar_t is what RH used), and then modify the bottom of
generic/regcustom.h, where you will see 3 lines that need mods for
the change in size of CHR (which is Tcl_UniChar for the RE).

Of course, that's what I think is needed.  It should probably then
get extended tests for more characters and further expectations.
We should probably add a tcl_platform(unicharSize) var or something
so that users at the Tcl level know this as well.  Again, this is
only something that I have tinkered with - not extensively tested.

Regards,

  Jeff Hobbs                     The Tcl Guy
  Senior Developer               http://www.ActiveState.com/