Re: [lxml-dev] Python XML Validator
Hi, moving this here from python-dev (where it started for whatever reason...) Mike Meyer wrote:
On Tue, 11 Mar 2008 18:01:29 +0100 Stefan Behnel wrote:
BTW, we had MacOS builds a while ago, so I wouldn't mind having someone volunteer to contribute builds on a regular basis (static builds preferred).
For which Python build? python.org? Activatestate? Leopard? Macports? Fink? pkgsrc? Any idea if a single build will work for all of them?
I have no idea. At the very least, different Python major versions will pose a problem. And I guess the builds provided by package distributions like fink and macports will also require newer dependencies on other ends, or be built with newer compilers...
The second time for OS-X, I used an older version of lxml (1.3.6), and just did "setup.py install". Worked like a charm. That's not hard. Interesting. 1.3.6 should also require libxml2 2.6.20 (although maybe less strictly than 2.0).
I just grabbed it and tried parsing thing with it; I didn't try the advanced features that I depend on lxml for (rng validation and lots of xpath), or what the OP was looking for (validation). Running the test.py suite turns has one failure:
File "/Users/mwm/lxml-1.3.6/src/lxml/tests/../../../doc/parsing.txt", line 369, in parsing.txt Failed example: etree.tounicode(root) Expected: u'<test> \uf8d1 + \uf8d2 </test>' Got: u'<test> + </test>'
If that's the only problem, then 1.3.x works 'acceptably' with 2.6.16 - except that newer versions are much better in parsing HTML and validating with XML schema (amongst other things). Note that the test suite tends to avoid testing features that only depend on libxml2, and especially stuff that has changed between library versions. It's a test suite for lxml, not for libxml2. However, 2.0 will not work that easily. Things like parse-time schema validation and schematron support do not work on versions below 2.6.20 (or actually 2.6.21, but we disable schematron on 2.6.20). We might be able to work around some more stuff by spreading some #ifdef's and #defines, but so far, I find it perfectly acceptable if 2.0 requires newer dependencies for new features. People who care about reliability will not use libraries as old as 2.6.16 anyway. The list of fixed bug only gets longer with newer versions.
Which means you wind up having to build those yourself if you want a recent version of lxml, even if you're using a system that includes lxml in it's package system. If you want a clean system, e.g. for production use, buildout has proven to be a good idea. And we also provide pretty good instructions on our web page on how to install lxml on MacOS-X and what to take care of. Yes, but the proposal was to include it in the Python standard library. Software that doesn't work on popular target platforms without updating a standard system library isn't really suitable for that. Hmm, coming somewhat back on-topic: how does Python currently handle its dependencies under MacOS-X? SQLite, for example? Does it use system libraries only, or are there libraries it ships with? (The MacOS distro is much bigger, but that might be due to the universal build - although that suggests that MacOS-X users do not care about disk space or download size anyway)
For most of them, it checks for the existence of the libraries and header files for those packages, and then builds the wrapper libraries if it finds their requirements. Look through the 2.5.2 setup.py for how sqlite3 is handled (it's a bit much to include here).
Funny, looking for the sqlite setup was actually a good idea. It does all sorts of things to figure out a good one to use, specifically on MacOS-X. There even appears to be some trickery to take the first library it finds, static or dynamic, instead of continuing to look for a dynlib. I wouldn't mind adding a similar setup to lxml's setupinfo.py. Maybe someone can get a hand on this? It would be great to have an automatic static build on MacOS, so that people could just run setup.py and be sure it uses the expected libs the next time they use it. Is there a standard directory prefix where macport & Co. install libraries and related stuff like xslt-config? Stefan
On Wed, 12 Mar 2008 18:55:20 +0100 Stefan Behnel <stefan_ml@behnel.de> wrote:
On Tue, 11 Mar 2008 18:01:29 +0100 Stefan Behnel wrote:
BTW, we had MacOS builds a while ago, so I wouldn't mind having someone volunteer to contribute builds on a regular basis (static builds preferred). For which Python build? python.org? Activatestate? Leopard? Macports? Fink? pkgsrc? Any idea if a single build will work for all of them? I have no idea. At the very least, different Python major versions will pose a
Mike Meyer wrote: problem. And I guess the builds provided by package distributions like fink and macports will also require newer dependencies on other ends, or be built with newer compilers...
The package system versions can probably be ignored, as the package systems will provide lxml (or the libraries and an environment to find them from setup.py). ActiveState seems to be a world of it's own. Personally, I'll use the Apple python until I need a build with features they it doesn't provide (probably sometime after 2.6 comes out later this year). But that doesn't seem like a good way to pick one to support.
The second time for OS-X, I used an older version of lxml (1.3.6), and just did "setup.py install". Worked like a charm. That's not hard. Interesting. 1.3.6 should also require libxml2 2.6.20 (although maybe less strictly than 2.0).
I just grabbed it and tried parsing thing with it; I didn't try the advanced features that I depend on lxml for (rng validation and lots of xpath), or what the OP was looking for (validation). Running the test.py suite turns has one failure:
File "/Users/mwm/lxml-1.3.6/src/lxml/tests/../../../doc/parsing.txt", line 369, in parsing.txt Failed example: etree.tounicode(root) Expected: u'<test> \uf8d1 + \uf8d2 </test>' Got: u'<test> + </test>'
If that's the only problem, then 1.3.x works 'acceptably' with 2.6.16 - except that newer versions are much better in parsing HTML and validating with XML schema (amongst other things). Note that the test suite tends to avoid testing features that only depend on libxml2, and especially stuff that has changed between library versions. It's a test suite for lxml, not for libxml2.
That's what I was afraid of. This is the "easy" solution for OSX, but it doesn't get you software you'd want to for the advanced features that make lxml so attractive.
However, 2.0 will not work that easily. Things like parse-time schema validation and schematron support do not work on versions below 2.6.20 (or actually 2.6.21, but we disable schematron on 2.6.20). We might be able to work around some more stuff by spreading some #ifdef's and #defines, but so far, I find it perfectly acceptable if 2.0 requires newer dependencies for new features. People who care about reliability will not use libraries as old as 2.6.16 anyway. The list of fixed bug only gets longer with newer versions.
Lxml is a cutting edge tool for xml work. I need the features it offers that make it such, and that means having recent versions of those libraries, because earlier ones didnt' have them. That's cool with me.
Which means you wind up having to build those yourself if you want a recent version of lxml, even if you're using a system that includes lxml in it's package system. If you want a clean system, e.g. for production use, buildout has proven to be a good idea. And we also provide pretty good instructions on our web page on how to install lxml on MacOS-X and what to take care of. Yes, but the proposal was to include it in the Python standard library. Software that doesn't work on popular target platforms without updating a standard system library isn't really suitable for that. Hmm, coming somewhat back on-topic: how does Python currently handle its dependencies under MacOS-X? SQLite, for example? Does it use system libraries only, or are there libraries it ships with? (The MacOS distro is much bigger, but that might be due to the universal build - although that suggests that MacOS-X users do not care about disk space or download size anyway)
For most of them, it checks for the existence of the libraries and header files for those packages, and then builds the wrapper libraries if it finds their requirements. Look through the 2.5.2 setup.py for how sqlite3 is handled (it's a bit much to include here).
Funny, looking for the sqlite setup was actually a good idea. It does all sorts of things to figure out a good one to use, specifically on MacOS-X. There even appears to be some trickery to take the first library it finds, static or dynamic, instead of continuing to look for a dynlib.
I wouldn't mind adding a similar setup to lxml's setupinfo.py. Maybe someone can get a hand on this? It would be great to have an automatic static build on MacOS, so that people could just run setup.py and be sure it uses the expected libs the next time they use it.
This sounds like the best approach, especially if the install/build document provides pointers to the package systems it checks for.
Is there a standard directory prefix where macport & Co. install libraries and related stuff like xslt-config?
There's a default, but you can change it for all of them (a feature they all inherited from Hubbard's original version for FreeBSD). The default prefix for each of them is: MacPorts: /opt/local/... Fink: /sw/... pkgsrc: /usr/pkg/... <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information.
participants (2)
-
Mike Meyer
-
Stefan Behnel