Can't compile _tkinter.c with Redhat 9 (post-SF#719880)
Hi, I am unable to compile Python 2.3 + _tkinter (latest CVS) with Redhat 9. Earlier suggestions on python-dev and SF won't help or arise other problems. The _tkinter module won't be compiled: I get the following error messages when I do nothing: Modules/_tkinter.c:96:2: #error "unsupported Tcl configuration" Modules/_tkinter.c: In function `AsObj': Modules/_tkinter.c:947: warning: passing arg 1 of `Tcl_NewUnicodeObj' from incompatible pointer type Modules/_tkinter.c: In function `FromObj': Modules/_tkinter.c:1073: warning: passing arg 1 of `PyUnicodeUCS2_FromUnicode' from incompatible pointer type About same problem was arised in this message: http://mail.python.org/pipermail/python-dev/2003-April/034724.html And in SF#719880. But there are some differences. This is after SF#719880 has been applied. When I remove the test, I do *not* get a working version of Tkinter: $ ./python Python 2.3b1+ (#1, Jun 12 2003, 22:15:45) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. 0 >>> import Tkinter 1 >>> Tkinter._test() Segmentation fault Later in the thread, the suggestion was made to compile with --enable-unicode=ucs4. However, then I get: *** WARNING: renaming "array" since importing it failed: build/lib.linux-i686-2.3/array.so: undefined symbol: PyUnicodeUCS2_FromUnicode *** WARNING: renaming "_testcapi" since importing it failed: build/lib.linux-i686-2.3/_testcapi.so: undefined symbol: PyUnicodeUCS2_Decode *** WARNING: renaming "unicodedata" since importing it failed: build/lib.linux-i686-2.3/unicodedata.so: undefined symbol: PyUnicodeUCS2_FromUnicode *** WARNING: renaming "_locale" since importing it failed: build/lib.linux-i686-2.3/_locale.so: undefined symbol: PyUnicodeUCS2_AsWideChar *** WARNING: renaming "cPickle" since importing it failed: build/lib.linux-i686-2.3/cPickle.so: undefined symbol: PyUnicodeUCS2_AsUTF8String *** WARNING: renaming "pyexpat" since importing it failed: build/lib.linux-i686-2.3/pyexpat.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8 Tkinter now works, but array, locale, cPickle, etc. are not present. How can I solve this problem? Is it a bug? Or is it something on my system? Gerrit. -- 234. If a shipbuilder build a boat of sixty gur for a man, he shall pay him a fee of two shekels in money. -- 1780 BC, Hammurabi, Code of Law -- Asperger Syndroom - een persoonlijke benadering: http://people.nl.linux.org/~gerrit/ Het zijn tijden om je zelf met politiek te bemoeien: http://www.sp.nl/
On Thu, 2003-06-12 at 17:21, Jeff Epler wrote:
On Thu, Jun 12, 2003 at 11:12:57PM +0200, Gerrit Holl wrote:
Tkinter now works, but array, locale, cPickle, etc. are not present.
You'll probably need to remove the shared modules that were built with the different UCS setting, then "make" again.
Yep, I've expanded my config options from --with-pydebug to include --enable-unicode=ucs4 -Barry
"Martin v. L�wis" wrote:
Gerrit Holl wrote:
How can I solve this problem? Is it a bug? Or is it something on my system?
make again (after removing ./build), no, and yes: it is a Redhat 9 system.
Thanks! I am probably not the only one encountering this problem and I didn't find this behaviour documented in the README. Should it be? If so, should I sent a docpatch? It's Redhat's failure, of course, but a lot of unsuspecting RH-users are out (t)here...
Ask Redhat why they chose to hack Tcl to better accomodate Python.
I take it 'better' is ironically here? :) (I am not good at irony, if it isn't I haven't understood it very well ;). Redhat does have bugfix updates, I will look if they know something about this tomorrow (IIRC they have a buglist). Gerrit. -- 1. If any one ensnare another, putting a ban upon him, but he can not prove it, then he that ensnared him shall be put to death. -- 1780 BC, Hammurabi, Code of Law -- Asperger Syndroom - een persoonlijke benadering: http://people.nl.linux.org/~gerrit/ Het zijn tijden om je zelf met politiek te bemoeien: http://www.sp.nl/
Gerrit Holl wrote:
I am probably not the only one encountering this problem and I didn't find this behaviour documented in the README. Should it be? If so, should I sent a docpatch?
Sure: You might want to revise the entire Redhat section in README.
Ask Redhat why they chose to hack Tcl to better accomodate Python.
I take it 'better' is ironically here? :)
Well, they decided to built Python 2.2 in UCS-4 mode, which is, in general, a good thing, except that it doesn't work. In particular, sre and _tkinter don't work well; _tkinter does not work at all. Instead of fixing _tkinter (which I now did for Python 2.3), they chose to hack Tcl instead, to extend it to UCS-4 mode. Now I have to deal with standard Tcl, which is UCS-2, and can support both UCS-2 and UCS-4 Python _tkinter with that (2.2 would mandate UCS-2 Python for _tkinter). I also have to support Redhat Tcl, which is UCS-4. I chose to only support this in combination with UCS-4 Python. If anybody wants the combination Redhat Tcl + UCS-2 Python, feel free to develop a patch.
Redhat does have bugfix updates, I will look if they know something about this tomorrow (IIRC they have a buglist).
It's not a bug. They *really* made this change to support their own build of Python. Unfortunately, they have thereby tied Python to UCS-4 on Redhat 9. Their change is a hack, though, as Tcl does not really support UCS-4 - it merely compiles now, but don't dare to use non-BMP characters. Once standard Tcl supports UCS-4, we probably need to look into this again. Regards, Martin
My ears were ringing for this one, so I thought I'd add some comments. Martin v. Lowis wrote:
Ask Redhat why they chose to hack Tcl to better accomodate Python.
I take it 'better' is ironically here? :)
Well, they decided to built Python 2.2 in UCS-4 mode, which is, in general, a good thing, except that it doesn't work. In particular, sre and _tkinter don't work well; _tkinter does not work at all. Instead of fixing _tkinter (which I now did for Python 2.3), they chose to hack Tcl instead, to extend it to UCS-4 mode.
Can someone explain to me why moving to UCS-4 is a good thing? What RedHat did is a Bad Thing, in completely screwing with compatability (with no warning). See below.
Now I have to deal with standard Tcl, which is UCS-2, and can support both UCS-2 and UCS-4 Python _tkinter with that (2.2 would mandate UCS-2 Python for _tkinter). I also have to support Redhat Tcl, which is UCS-4. I chose to only support this in combination with UCS-4 Python. If anybody wants the combination Redhat Tcl + UCS-2 Python, feel free to develop a patch.
Redhat does have bugfix updates, I will look if they know something about this tomorrow (IIRC they have a buglist).
It's not a bug. They *really* made this change to support their own build of Python. Unfortunately, they have thereby tied Python to UCS-4 on Redhat 9. Their change is a hack, though, as Tcl does not really support UCS-4 - it merely compiles now, but don't dare to use non-BMP characters.
What UCS-4 support are you looking for that doesn't seem to exist? While Tcl is agnostic about non-BMP chars (all 2 of them ... ha ha), it does have correct UCS-4 support (not completely though with how RedHat patched it). This has been discussed before briefly here: https://sourceforge.net/tracker/?func=detail&aid=578030&group_id=10894&atid= 110894 A Tcl_UniChar is 32-bits and TCL_UTF_MAX is 6 (normally it is 3), which represents the number of utf-8 bytes that are valid in sequence. Strings going through Tcl are handled as either utf-8 char*s or utf-32 (aka UCS-4) Tcl_UniChar*s. I do realize that correct handling on non-BMP characters requires some more work, but that is orthogonal to this issue. While UCS-4 opens up more code points to allow non-BMP chars, there are very few in that range at this point. The bigger issue is that in changing the basic Tcl_UniChar size, you break the binary compatability rules. RH9 is the only version/distro to use 32-bit Tcl_UniChar, which breaks compatability with extensions build on other versions/distros. This is particularly bad for Tcl, which has the stubs mechanism that allows for (promises really) upwards binary compatability for an extension build against 8.x with 8.y where y >= x. RH breaks that, and that is surprising users. Also, while Tcl can build and works just find with 32-bit Tcl_UniChar, but I don't recall testing Tk when I tested Tcl. Checking on a rebuild now, it does appear that Tk operates just fine. However, it does consume a lot more memory. I finally found the source RPMs for Tcl that RH9 uses and checked out there patch. It's not even correct. You have to modify tcl/generic/regcustom.h as well to account for Tcl_UniChar being 32-bits. IOW, it's very annoying to me that someone at RedHat went blundering around in the dark making these modifications when it is fairly easy to find and communicate with the core developers on the what, how and why of doing things correctly. So I understand the need to deal with varying UCS-2 and UCS-4 Tcl builds, but I can't imagine needing UCS-4 Tcl with UCS-2 Python because I don't think anyone is crazy enough to do that combo. BTW, cc me on any follow-up, as I am not on the list. Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/ Tcl Support and Productivity Solutions
"Jeff Hobbs" wrote My ears were ringing for this one, so I thought I'd add some comments. [exactly how RH screwed up Tcl/Tk]
It's probably worth also noting that it seems like the RH9 version of tcl/tk leaks memory like a sieve. Running exmh, I see a leak of about 100Mbytes a day. So if you've got long running Tkinter processes, it'd pay to keep an eye on the memory consumption... Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.
From: Anthony Baxter
"Jeff Hobbs" wrote My ears were ringing for this one, so I thought I'd add some comments. [exactly how RH screwed up Tcl/Tk]
It's probably worth also noting that it seems like the RH9 version of tcl/tk leaks memory like a sieve. Running exmh, I see a leak of about 100Mbytes a day. So if you've got long running Tkinter processes, it'd pay to keep an eye on the memory consumption...
Leaks, or just consumes? I mentioned in my previous email that it appears that Tk does consume a lot of memory when using 32-bit Tcl_UniChars. Tk isn't completely leak-free, but it doesn't have any known major leaks. I don't know why increasing the size of a Tcl_UniChar would start creating leaks (although it does seem to consume more memory than it should just for that increase in data size ... so it does require more examination). Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/
"Jeff Hobbs" wrote Leaks, or just consumes? I mentioned in my previous email that it appears that Tk does consume a lot of memory when using 32-bit Tcl_UniChars. Tk isn't completely leak-free, but it doesn't have any known major leaks. I don't know why increasing the size of a Tcl_UniChar would start creating leaks (although it does seem to consume more memory than it should just for that increase in data size ... so it does require more examination).
After 2 1/2 days it's using 250Mbytes of RAM. There's a slow but steady increase continually across the time it's running. That's without loading any HTML or any weird stuff - just text. Anthony
"Jeff Hobbs" <jeffh@ActiveState.com> writes:
Can someone explain to me why moving to UCS-4 is a good thing?
Because it simplifies processing of non-BMP characters, as it restores the property that you get one Unicode character per string index.
What UCS-4 support are you looking for that doesn't seem to exist?
It crashes when fed non-BMP characters. In addition, it lacks a configuration option, or any kind of documentation telling packagers on how to build a UCS-4 Tcl/Tk.
While Tcl is agnostic about non-BMP chars (all 2 of them ... ha ha), it does have correct UCS-4 support (not completely though with how RedHat patched it). This has been discussed before briefly here:
https://sourceforge.net/tracker/?func=detail&aid=578030&group_id=10894&atid= 110894
Which of the follow-up messages do you consider reliable information in this report? davygrvy comments appear to be irrelevant, as they talk about Unicode 3.0, keithp likewise. Your own comment appears to talk about possible future changes, instead of the current code.
A Tcl_UniChar is 32-bits and TCL_UTF_MAX is 6 (normally it is 3), which represents the number of utf-8 bytes that are valid in sequence.
Is that current code, or future code? How can I select a UCS-4 build during configuration? In what way is the supported mechanism different from the one that Redhat uses?
I do realize that correct handling on non-BMP characters requires some more work, but that is orthogonal to this issue. While UCS-4 opens up more code points to allow non-BMP chars, there are very few in that range at this point.
I couldn't find definitive numbers on distribution over planes, but I found the following numbers: - Unicode 3.0 has 49194 assigned characters (http://www.unicode.org/versions/Unicode3.0.html) - Unicode 4.0 has 96248 graphic characters (http://www.unicode.org/versions/Unicode4.0.0/) I don't know how many of the new assignments are in the BMP, but it appears that there are roughly as many assigned BMP characters as there are assigned characters outside the BMP.
The bigger issue is that in changing the basic Tcl_UniChar size, you break the binary compatability rules. RH9 is the only version/distro to use 32-bit Tcl_UniChar, which breaks compatability with extensions build on other versions/distros.
Indeed. Python has added explicit mechanisms to detect such breakage, by renaming all API functions depending on the width of a Unicode character. That, atleast, allows to detect the breakage at import time (missing symbols).
Also, while Tcl can build and works just find with 32-bit Tcl_UniChar, but I don't recall testing Tk when I tested Tcl. Checking on a rebuild now, it does appear that Tk operates just fine. However, it does consume a lot more memory.
When I tested it, I found that it would break very easily. I was using the Redhat procedure, though, so I might have made something wrong.
I finally found the source RPMs for Tcl that RH9 uses and checked out there patch. It's not even correct. You have to modify tcl/generic/regcustom.h as well to account for Tcl_UniChar being 32-bits.
What is the specific change that one has to make? "You have to edit multiple files to activate a feature" is a strange way of supporting it...
IOW, it's very annoying to me that someone at RedHat went blundering around in the dark making these modifications when it is fairly easy to find and communicate with the core developers on the what, how and why of doing things correctly.
Indeed. In the specific case, they made the Tcl change to support UCS-4 Python, when it would have been cleaner, IMO, to fix _tkinter. Alas, they did not contact us, either. Regards, Martin
From: martin@v.loewis.de Jeff Hobbs writes:
Can someone explain to me why moving to UCS-4 is a good thing?
Because it simplifies processing of non-BMP characters, as it restores the property that you get one Unicode character per string index.
Right, fair enough, that's all well understood - when you have to deal with characters between U+10000 and U+10FFFF. It was only recently that such characters existed in more than a sprinkling.
A Tcl_UniChar is 32-bits and TCL_UTF_MAX is 6 (normally it is 3), which represents the number of utf-8 bytes that are valid in sequence.
Is that current code, or future code? How can I select a UCS-4 build during configuration? In what way is the supported mechanism different from the one that Redhat uses?
There is no "supported" UCS-4 mode for Tcl. You have to hand-twiddle the sources, knowing where to poke. I can make the changes for 8.5 that allow for an easy configuration option to compile in UCS-4 mode. I suppose I could also back-port it to 8.4.4. That won't address the fact that we've never validated non-BMP support.
I couldn't find definitive numbers on distribution over planes, but I found the following numbers: - Unicode 3.0 has 49194 assigned characters (http://www.unicode.org/versions/Unicode3.0.html) - Unicode 4.0 has 96248 graphic characters (http://www.unicode.org/versions/Unicode4.0.0/)
Right, and Unicode 4.0 is fresh out of diapers. You can't even get the regular code charts yet, you have to view the 4.0 beta ones. With 4.0 the non-BMP finally gets a notable amount of characters, but they are fairly weird ones that I'd be surprised to find a public font for. You can see them at: http://www.unicode.org/charts/u40-beta.html They are the Linear B Syllabary on down.
The bigger issue is that in changing the basic Tcl_UniChar size, you break the binary compatability rules. RH9 is the only version/distro to use 32-bit Tcl_UniChar, which breaks compatability with extensions build on other versions/distros.
Indeed. Python has added explicit mechanisms to detect such breakage, by renaming all API functions depending on the width of a Unicode character. That, atleast, allows to detect the breakage at import time (missing symbols).
Tcl could do this, but we were very much taken by surprise that it was pushed to use UCS-4 at all.
Checking on a rebuild now, it does appear that Tk operates just fine. However, it does consume a lot more memory.
When I tested it, I found that it would break very easily. I was using the Redhat procedure, though, so I might have made something wrong.
Can you feed me some sample scripts offline to test with?
I finally found the source RPMs for Tcl that RH9 uses and checked out there patch. It's not even correct. You have to modify tcl/generic/regcustom.h as well to account for Tcl_UniChar being 32-bits.
What is the specific change that one has to make? "You have to edit multiple files to activate a feature" is a strange way of supporting it...
Ha ha ... well, I did say it was never properly supported. That noone bothered to ask how to do it correctly when that was clear is not a good thing. What you have to do is modify generic/tcl.h to set TCL_UTF_MAX to 6, typedef Tcl_UniChar as unsigned int (or wchar_t is what RH used), and then modify the bottom of generic/regcustom.h, where you will see 3 lines that need mods for the change in size of CHR (which is Tcl_UniChar for the RE). Of course, that's what I think is needed. It should probably then get extended tests for more characters and further expectations. We should probably add a tcl_platform(unicharSize) var or something so that users at the Tcl level know this as well. Again, this is only something that I have tinkered with - not extensively tested. Regards, Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/
"Jeff Hobbs" <jeffh@ActiveState.com> writes:
Right, and Unicode 4.0 is fresh out of diapers. You can't even get the regular code charts yet, you have to view the 4.0 beta ones. With 4.0 the non-BMP finally gets a notable amount of characters, but they are fairly weird ones that I'd be surprised to find a public font for.
Actually, Unicode 3.2 added most of them; Unicode 4.0 has only minor additions. The biggest addition is the CJK unified ideographs in plane 2, which is of significance to a fifth of the world population, potentially - these are all current characters (instead of being only of scientific interest). I agree that it will take a while to have fonts for these available. However, unavailability of fonts should not stop us from supporting such characters in the libraries. Characters must pass through many processing steps before being rendered, and all of these steps must work flawlessly. So I think it is a good thing to start today, to have the application software ready by the time fonts become available.
When I tested it, I found that it would break very easily. I was using the Redhat procedure, though, so I might have made something wrong.
Can you feed me some sample scripts offline to test with?
Will do, but it may take some time. It was so obvious to me that this is not at all officially supported that I did not bother reporting it, or taking notes. Regards, Martin
On 16 Jun 2003, Martin v. [iso-8859-15] Löwis wrote:
"Jeff Hobbs" <jeffh@ActiveState.com> writes:
Right, and Unicode 4.0 is fresh out of diapers. You can't even get the regular code charts yet, you have to view the 4.0 beta ones. With 4.0 the non-BMP finally gets a notable amount of characters, but they are fairly weird ones that I'd be surprised to find a public font for.
Actually, Unicode 3.2 added most of them; Unicode 4.0 has only minor additions. The biggest addition is the CJK unified ideographs in plane 2, which is of significance to a fifth of the world population, potentially - these are all current characters (instead of being only of scientific interest).
I agree that it will take a while to have fonts for these available. However, unavailability of fonts should not stop us from supporting such characters in the libraries. Characters must pass through many processing steps before being rendered, and all of these steps must work flawlessly. So I think it is a good thing to start today, to have the application software ready by the time fonts become available.
Hello, Here are the reasons Red Hat went for ucs4: - Red Hat 7.3 shipped python 2.2 (compiled in the python2 rpm). Not too surprising that a number of configuration tools were already using it. But no one saw tkinter was broken until too late - yes it was compiled with ucs4. - Red Hat 8.0 went to --enable-unicode=ucs2: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=63965 - While shipping an erratum for 7.3 that updated python2 to 2.2.2, I realized the compatibility mess we are in. 7.3 is UCS4, 8.0 is UCS2. Part of the reasons Red Hat 9 is called 9 and not 8.1 is to suggest there are things that are incompatible with previous versions. There should be a note in the release notes, stating python is now UCS4. The logic I tried to apply was exactly what Martin said earlier - I would prefer to break compatibility now rather than later when more people would notice :-) Arguably, we should have beaten the heck out of tkinter at Red Hat 8 time and fix it then - almost 2 years ago, that is.
When I tested it, I found that it would break very easily. I was using the Redhat procedure, though, so I might have made something wrong.
Can you feed me some sample scripts offline to test with?
Will do, but it may take some time. It was so obvious to me that this is not at all officially supported that I did not bother reporting it, or taking notes.
The patch for tcl that we came up with was, IIRC, posted somewhere on the tcl sites. I'll have to dig to find it again. So, I am willing to wear the brown paper bag for not letting the community know about the UCS4 change well in advance. I find it surprising to find out _now_ that _tkinter.c was the one that was broken, I wish I knew that before poking at fixing tcl instead. I am surprised the tcl community was not aware of this either. Anyway, UCS4 is something I don't think we can change now - and after reading Martin's post, it makes me feel it's the correct way to go. How can we fix the tcl/python interaction? If you have suggestions, please let me know (off-list would be fine too). Cheers, Misa
Mihai Ibanescu <misa@redhat.com> writes:
Anyway, UCS4 is something I don't think we can change now - and after reading Martin's post, it makes me feel it's the correct way to go. How can we fix the tcl/python interaction? If you have suggestions, please let me know (off-list would be fine too).
For 2.3, I have fixed it in the way that you can build _tkinter in both UCS-2 and UCS-4 against a UCS-2 Tcl, and you can build a UCS-4 Python against a UCS-4 Tcl. What is missing is building a standard (i.e. UCS-2) Python against the Redhat (i.e. UCS-4) Tcl. There is now a note in README telling people to configure Python for UCS-4 on Redhat. It appears that all Redhat 9 users running into that so far find that acceptable. Still, it would slightly simplify the build procedure to support UCS-2 Python against UCS-4 Tcl. To do that, that case needs to be detected at compile time, and appropriate character-by-character conversions must be made. This is not on my TODO list, but I would accept patches that do that. I have advised the Debian Python maintainer (Matthias Klose), that staying with UCS-2 for Python 2.2 is a good idea, as there are still problems left in 2.2. For 2.3, I have suggested that they build with UCS-4, especially as one Debian package requires that. So for compatibility across Linux distributions, I hope that distributors will configure Python 2.3 with UCS-4 on Linux. Regards, Martin P.S. What is IMO more urgently missing is "proper" support for that feature in Tcl, if it is meant to be a feature, in terms of documentation, configuration options, etc. Or, if Tcl maintainers decide this is *not* a desirable feature, it would help if they say so (but I think Jeff's comments indicate that this is unlikely).
On Tue, 2003-06-17 at 16:53, Martin v. Löwis wrote:
What is missing is building a standard (i.e. UCS-2) Python against the Redhat (i.e. UCS-4) Tcl. There is now a note in README telling people to configure Python for UCS-4 on Redhat. It appears that all Redhat 9 users running into that so far find that acceptable.
It would be moderately nicer if configure could detect that UCS-4 was required and automatically provide the --enable-unicode=ucs4 option. -Barry
On Tue, Jun 17, 2003 at 05:07:21PM -0400, Barry Warsaw wrote:
On Tue, 2003-06-17 at 16:53, Martin v. Löwis wrote:
What is missing is building a standard (i.e. UCS-2) Python against the Redhat (i.e. UCS-4) Tcl. There is now a note in README telling people to configure Python for UCS-4 on Redhat. It appears that all Redhat 9 users running into that so far find that acceptable.
It would be moderately nicer if configure could detect that UCS-4 was required and automatically provide the --enable-unicode=ucs4 option.
You'd be insane to stuff this in just before releasing 2.3 (since this is the first time I've ever written for autoconf), but here it is.. Running on a RedHat 9 machine: checking for wchar.h... yes checking for wchar_t... yes checking size of wchar_t... 4 checking for UCS-4 tcl... yes checking what type to use for unicode... wchar_t Running on a RedHat 7.1 machine: checking wchar.h usability... yes checking wchar.h presence... yes checking for wchar.h... yes checking for wchar_t... yes checking size of wchar_t... 4 checking for UCS-4 tcl... no checking what type to use for unicode... unsigned short Tested nowhere else in the universe... in particular, there's no attempt to find the location of the tcl.h header as setup.py does since it's in /usr/include/tcl.h on my system. Detection of UCS-4 tcl does not override the --enable-unicode= setting. Someone who is building Python on a redhat9 system but does not want _tkinter may want the chance to override it, if only due to increased memory consumption of that build. Jeff --- Python-2.3b1/configure.in 2003-04-11 10:35:52.000000000 -0500 +++ local-Python-2.3b1/configure.in 2003-06-17 20:17:13.000000000 -0500 @@ -2400,6 +2400,18 @@ AC_CHECK_SIZEOF(wchar_t, 4, [#include <wchar.h>]) fi +AC_MSG_CHECKING(for UCS-4 tcl) +have_ucs4_tcl=no +AC_TRY_COMPILE([ +#include <tcl.h> +#if TCL_UTF_MAX != 6 +# error "NOT UCS4_TCL" +#endif], [], [ + AC_DEFINE(HAVE_UCS4_TCL, 1, [Define this if you have tcl and TCL_UTF_MAX==6]) + have_ucs4_tcl=yes +]) +AC_MSG_RESULT($have_ucs4_tcl) + AC_MSG_CHECKING(what type to use for unicode) dnl quadrigraphs "@<:@" and "@:>@" produce "[" and "]" in the output AC_ARG_ENABLE(unicode, @@ -2410,7 +2422,12 @@ if test $enable_unicode = yes then - # Without any arguments, Py_UNICODE defaults to two-byte mode - enable_unicode="ucs2" + # Without any arguments, Py_UNICODE changes to match tcl (if present) + case "$have_ucs4_tcl" in + yes) enable_unicode="ucs4" + ;; + *) enable_unicode="ucs2" + ;; + esac fi AH_TEMPLATE(Py_UNICODE_SIZE,
It was a busy day for me, but I wanted to address the open qs here:
From: martin@v.loewis.de Mihai Ibanescu <misa@redhat.com> writes:
Anyway, UCS4 is something I don't think we can change now - and after reading Martin's post, it makes me feel it's the correct way to go. How can we fix the tcl/python interaction? If you have suggestions, please let me know (off-list would be fine too).
For Python perhaps, if there exist extensions which require it, but I still didn't see any convincing argument to build Tcl UCS-4. Since Martin has corrected things to work with UCS-4 Python against UCS-2 Tcl, that seems like a more sane way to go. I want to get a chance to examine what connectivity issues he found, and look into the Tk UCS-4 issues. We know a lot more about UCS-2 Tcl (the standard). It also makes it compatible with RH8 Tcl and all the extensions that people have already built. That means extension rpms made on RH8 will work on RH9 (that is not currently the case). I am interested in updating the build system to have a UCS-4 enabling build switch, but the key thing that I am worried about is the very notable excess memory usage by Tk when built with UCS-4 (just Tcl seems to be fine). Since there appears to be no imperative reason to have a UCS-4 Tcl build (thanks to Martin's shimmy magic), it would seem best not to have the default Tk bloating, along with the other issues.
P.S. What is IMO more urgently missing is "proper" support for that feature in Tcl, if it is meant to be a feature, in terms of documentation, configuration options, etc. Or, if Tcl maintainers decide this is *not* a desirable feature, it would help if they say so (but I think Jeff's comments indicate that this is unlikely).
Right, I'm aware of it, and will make it so for 8.4.4 and 8.5. It won't get back into 8.3 (which we consider end-of-devel life). Which brings up another point ... since Martin, et al, also did a great job correcting Tkinter support for 8.4, why not move up to that as the base version (it is already the base version in SuSE and Debian-testing)? It is 100% upwards compatible from 8.3 and provides numerous enhancements for Tk users (like compound buttons, new core widgets, enhanced performance, etc). Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/
"Jeff Hobbs" <jeffh@ActiveState.com> writes:
Right, I'm aware of it, and will make it so for 8.4.4 and 8.5. It won't get back into 8.3 (which we consider end-of-devel life). Which brings up another point ... since Martin, et al, also did a great job correcting Tkinter support for 8.4, why not move up to that as the base version (it is already the base version in SuSE and Debian-testing)?
If that helps: This is what Python 2.3 will do on Windows: ship with Tcl 8.4.
It is 100% upwards compatible from 8.3 and provides numerous enhancements for Tk users (like compound buttons, new core widgets, enhanced performance, etc).
Unfortunately, it is not. Tk decided to set some "indeterminate" fields in event structures to '??', whereas earlier they have been numbers. This has caused an incompatibility in Tkinter of Python 2.2.2 and earlier, which now crashes when trying to convert these to integers. This has been fixed in Python 2.2.3 and 2.3, by not trying to convert these fields, so that now Pmw crashes when interpreting these fields as integers (where "crashes" means "raising Python exceptions"). Regards, Martin
From: martin@v.loewis.de
It is 100% upwards compatible from 8.3 and provides numerous enhancements for Tk users (like compound buttons, new core widgets, enhanced performance, etc).
Unfortunately, it is not. Tk decided to set some "indeterminate" fields in event structures to '??', whereas earlier they have been numbers. This has caused an incompatibility in Tkinter of Python 2.2.2 and earlier, which now crashes when trying to convert these to integers. This has been fixed in Python 2.2.3 and 2.3, by not trying to convert these fields, so that now Pmw crashes when interpreting these fields as integers (where "crashes" means "raising Python exceptions").
Ah yes, a few instances like that can be considered incompatible, but it was really a correction for buggy behavior that existed in Tk. You could make requests for event fields that were valid (depending on the event type). You either got lucky and got some random data, or you could segfault. This relates to these bugs: https://sf.net/tracker/index.php?func=detail&aid=612110&group_id=12997&atid= 112997 https://sf.net/tracker/index.php?func=detail&aid=411307&group_id=12997&atid= 112997 Other items that are in this vein are different default sizes for buttons on Windows (better native size), and more standard behavior of transient toplevels across platforms (with docs). You also get stuff like the real XP scrollbar and auto-disabling of scrollbars on Windows (native style). Then there is of course the entirely new native OS X Aqua support, but I'm not sure if Tkinter is ported for OS X yet (??). Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/
"Jeff Hobbs" <jeffh@ActiveState.com> writes:
Ah yes, a few instances like that can be considered incompatible, but it was really a correction for buggy behavior that existed in Tk.
Unfortunately, for Python, this was more than just a few instances. If Tkinter receives an event, it would convert all fields wholesale, regardless of whether they would be needed or not. That means that any Tkinter application using Tk bind would crash - even if it did not look at the event at all. There were ten or so bug reports about it, both on SF and comp.lang.python. Now, I understand the reason for the change, and I agree it is a sensible change. It also is a serious incompatibility. I have no problems with serious incompatibilities as long as they are properly documented and announced. Regards, Martin
On Thu, Jun 19, 2003 at 09:51:01AM +0200, Martin v. Löwis wrote:
Now, I understand the reason for the change, and I agree it is a sensible change. It also is a serious incompatibility. I have no problems with serious incompatibilities as long as they are properly documented and announced.
The bind(n) manpage shipped with tcl 8.2.1 says this: Unless otherwise indicated, the replacement string is the decimal value of the given field from the current event. Some of the substitutions are only valid for certain types of events; if they are used for other types of events the value substituted is undefined. If I was being a language-lawyer, I'd sure attempt to argue that "the value substituted is undefined" means that "??" is one undefined value permissible by the spec. ... but I can't seriously believe that, because the *type* of the field is an integer, so to meet the letter of the manpage the undefined value must be some integer (or string representing an integer), not a non-integer string like "??". Anyway, while I don't know the tcl/tk development process, I suspect that if a Python advocate was active there (for instance, compiling python against tcl/tk CVS and reporting breakage) that this might have been avoided. It's a pity that nobody has appointed themselves to this role. Even if the tk developers had still made the incompatible change in a dot-dot-release (rather than, say, substituting a plainly out-of-range integer, like -sys.maxint-1), the Python community would have heard about it before that release was out of CVS. It would be nice if there was someone up ahead to make sure the light is not an oncoming train. I wish *I* had the time to do this but hell -- I can't get my company's product to move forward from the 8.2.1 release whose manpage I quoted, not even to a later 8.2.x release! Jeff
Jeff Epler <jepler@unpythonic.net> writes:
It would be nice if there was someone up ahead to make sure the light is not an oncoming train.
I first learned about the problem when a Debian bug report was forwarded to Python, along with a patch. I can't recall whether this was before or after the 8.4.2 release - notice that the incompatible change was in a subminor release. At that time I thought: well, the patch fixes the problem, so let's just apply it and, life goes on. The Tk changelog entry reads 2003-02-28 Donal K. Fellows <fellowsd@cs.man.ac.uk> * tests/bind.test (bind-16.44): * generic/tkBind.c (ExpandPercents): Only allow events to see those expansions that are actually valid for them, and force the substitution of the rest as "??". This stops some crashes on Windows and gets rid of bogus values everywhere. [Bug #612110] which made it quite clear that Tk people would not revert that change, no matter what. Regards, Martin
From: martin@v.loewis.de I first learned about the problem when a Debian bug report was forwarded to Python, along with a patch. I can't recall whether this was before or after the 8.4.2 release - notice that the incompatible change was in a subminor release.
Well, it was a correction to a bug fix that could cause Tk to crash. Crashing situations certainly merit placement in patchlevel releases. I think JeffE was more hoping that ints would remain ints for those fields with known types, ie use 0 instead of ??. The problem is that (IIRC) the type info was ignored for non-valid event data.
At that time I thought: well, the patch fixes the problem, so let's just apply it and, life goes on. The Tk changelog entry reads
2003-02-28 Donal K. Fellows <fellowsd@cs.man.ac.uk>
* tests/bind.test (bind-16.44): * generic/tkBind.c (ExpandPercents): Only allow events to see those expansions that are actually valid for them, and force the substitution of the rest as "??". This stops some crashes on Windows and gets rid of bogus values everywhere. [Bug #612110]
which made it quite clear that Tk people would not revert that change, no matter what.
When it comes to fixing bugs that can cause compat issues, we are usually prett good about noting them in direct announcement release notes. Checking up, I see that this was noted appropriately in the Tcl 8.4.2 release announcement (last item): https://sourceforge.net/project/shownotes.php?release_id=144142 """ * Make %-substitutions for events only read data out of the event structure when that field is valid for that event type. *** POTENTIAL INCOMPATIBILITY *** """ Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/
[only tangentially related to python-dev, but a followup to a previous annoying problem] I finally got annoyed enough by the Redhat 9 tcl/tk disaster to do something about it. I grabbed the tcl/tk source for the version installed in RH9 (the original source, not the one modified by redhat) and rebuilt it, and made a couple of RPMs. http://www.interlink.com.au/anthony/tech/rh9-tcltk/ I have no idea at all about the state of unicode/UCS2/UCS4 in these builds, it's just built with whatever tcl/tk 8.3.5 builds by default. I only deal with 8 bit fonts, so I don't care enough about the state of multi-byte fonts in tcl/tk to spend time on this. I _do_ care about having my exmh chew through 100 meg of memory on startup and then leaking memory like a sieve, however, and this version of the RPMs seem to fix this. Python2.2-cvs and Python2.3-cvs seem to build fine against it. Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.
... but I can't seriously believe that, because the *type* of the field is an integer, so to meet the letter of the manpage the undefined value must be some integer (or string representing an integer), not a non-integer string like "??".
OK, looking in tk/generic/tkBind.c:ExpandPercents, I see that it would be quite possible to correct this to use 0 or -1 for those fields that want to be ints. All the cases that are now correctly checking that the field would be valid just need a final else clause that would give some default int value (could even be event-field-specific). I'm not going to patch it, but I would accept it if someone wants to do that.
Anyway, while I don't know the tcl/tk development process, I suspect that if a Python advocate was active there (for instance, compiling python against tcl/tk CVS and reporting breakage) that this might have been avoided. It's a pity that nobody has appointed themselves to this
Anyone is welcome to join the process. I'd even add someone as a tk cvs member if they wished. While I can (and do) check bug reports for Tkinter that find their way to me, I don't have the bandwidth to do more.
I wish *I* had the time to do this but hell -- I can't get my company's product to move forward from the 8.2.1 release whose manpage I quoted, not even to a later 8.2.x release!
Well, that's *only* 4 years old. ;^) I recently worked with a very large company to upgrade some key systems that still had Tcl 7.1 embedded (we're talking 1995 code). <shudder> Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/
From: martin@v.loewis.de Jeff Hobbs writes:
Can someone explain to me why moving to UCS-4 is a good thing?
Because it simplifies processing of non-BMP characters, as it restores the property that you get one Unicode character per string index. ...
While Tcl is agnostic about non-BMP chars (all 2 of them ... ha ha), it does have correct UCS-4 support (not completely though with how RedHat patched it). This has been discussed before briefly here:
https://sourceforge.net/tracker/?func=detail&aid=578030&group_id=10894&atid= 110894
Which of the follow-up messages do you consider reliable information in this report? davygrvy comments appear to be irrelevant, as they talk about Unicode 3.0, keithp likewise. Your own comment appears to talk about possible future changes, instead of the current code.
BTW, I mentioned this because I'm not sure that the reasoning behind moving to a 32-bit integral type was due to RHs desire to support the extra chars in Unicode 4 (after all, without shipping fonts to display them ... what's the point?). Keith Packard, who submitted the bug report (RFE really) is one of the major XFree maintainers (err ... I guess that's xwin now). In any case, he wanted to allow 32-bit in X in part for ease of processing, advantages of word alignment, and other things. IOW, I'm not really sure that this was all done to support UCS-4 specifically, although that may have been a consideration. Jeff Hobbs The Tcl Guy Senior Developer http://www.ActiveState.com/
"Jeff Hobbs" <jeffh@ActiveState.com> writes:
BTW, I mentioned this because I'm not sure that the reasoning behind moving to a 32-bit integral type was due to RHs desire to support the extra chars in Unicode 4 (after all, without shipping fonts to display them ... what's the point?).
I guess a driving motivation is alignment with the C library, atleast that is what drove me to add UCS-4 support to Python. On Unix, traditionally, wchar_t, if interpreted as Unicode, is a four-byte data type. The Unicode spec performed an interesting dance about that: Unicode 2.0 claimed that it was outright non-conforming to use a four-byte wchar_t for Unicode. Unicode 3.0 said "well, you can". Unicode 3.2 now says "why not, it's a reasonable thing to do". So for us in the Unix world, the impression is that the C library's decision was always "right", and we are eager to support that decision. For libraries such as iconv, there is a performance advantage gained from matching the interpreter's Unicoode type with the system's wchar_t. Apart from that, there is also the feeling that ISO 10646 got it right and the Unicode consortium got it wrong. You really do need more than 64k code points if you want to unify all writing systems. From that viewpoint, UTF-16 is an ugly hack, which should be avoided whereever possible. Regards, Martin
participants (8)
-
"Martin v. Löwis"
-
Anthony Baxter
-
Barry Warsaw
-
Gerrit Holl
-
Jeff Epler
-
Jeff Hobbs
-
martin@v.loewis.de
-
Mihai Ibanescu