Tres Seaver wrote:
Tres Seaver wrote:
Martijn Faassen wrote:
Ah, so current CPython sources builds with 4 byte unicode by default? If this is for sure, then we're fairly safe. If not, then I wonder what to do - you'd like lxml to work with hand-compiled Pythons..
Nope. The distros all pass the '--enable-unicode=ucs4' to configure. The default value for that option is 'yes', which maps to 'ucs2' unless you also have a usc4-enabled TCL.
Right, that's what I witness, too.
Perhaps we could use the following test inside 'setup.py', and modify the name of the binary egg to include the 'ucs2' vs. 'ucs4' flag?::
ucs_flag = sys.maxunicode > 65536 and 'ucs4' or 'ucs2'
While that's nice to have, it doesn't really help us as a) we'd still have to build and ship both eggs (while the current UCS4 eggs seem to fit most users) and b) easy_install doesn't currently handle these extensions, so it would most likely just stop finding the eggs on cheeseshop if we added additional sections to the egg name. I still think it's enough to add a FAQ entry (which I already did) and otherwise ignore the problem for now. That way, the major distros are supported out-of-the-box. And for those who happen to use a UCS2 system, it's really not a big deal to build lxml from sources on a fairly recent and well installed Linux system. Stefan