Georges Racinet wrote:
On Jul 31, 2006, at 11:41 AM, Martijn Faassen wrote:
Hi there,
I just found out that there is a hidden incompatibility in the compiled versions of lxml eggs we provide, at least in linux. Our provided versions are compiled with a Python that has 4 bytes unicode support (probably the default on ubuntu on which I built the 2.4 extension).
Noticed that last week, too. Sorry I forgot to mention it over there.
What platform were you on when you noticed this? Mandriva (as you mention below)? [snip]
As far as I know, this is typical of the Ubuntu distribution, and I'm 100% sure this egg was laid from Ubuntu. If the egg system could make a difference between distributions, it would be ok, imho.
I think Red Hat has been compiling Python with 4 bytes characters for ages too, so while this was Ubuntu (I did it), I'm also pretty sure it's also the case on Fedora.
Charset problems are a plague.
This is not your common charset problems. Mostly one can avoid the plague by just using unicode, but that's what we're doing here..
By the way, does Pyrex generate different C code depending on whether 4 or 2 byte unicode is used? If so, then that would mean an installation of pyrex as well for these people...
I tried to compile from source on Mandriva, and it failed. I had no time to investigate (low priority for the task I was working on), it could very well have been something very trivial.
Interesting; let us know if you find out more. It's important to have the lxml C sources compile on all platforms, as otherwise people will be forced to use Pyrex, possibly even the forked version of Pyrex Stephan is maintaining. Regards, Martijn