I tested and tried a few XML validators but none of them is able to successfully validate a string of xml (not a file just a string) to programatically be able to validate messages of xml that flow in and out of the different systems. Teh validators I used were XSV, oNVDL and lxml, can we implement a pure python module for validating strings of xml using XML Schema (not DTD). lxml is good but not written in python and difficult to install and didn't work on MacOS X. XSV very poor documentation and only validates xml files not strings. oNVDL not writtem in python and only validates xml files not strings.
On Thu, 28 Feb 2008 23:42:49 +0000 (UTC) Medhat Gayed
lxml is good but not written in python and difficult to install and didn't work on MacOS X.
lxml is built on top of libxml2/libxslt, which are bundled with most
Unix-like OS's (including Mac OS X), or available in their package
systems. Trying to install it from the repository is a PITA, because
it uses both the easyinstall and Pyrex (later Cython) packages - which
aren't bundled with anything. On the other hand, if it's in the
package system (I no longer have macports installed anywhere, but
believe it was there at one time), that solves all those problems. I
believe they've excised the easyinstall source dependencies, though.
Using lxml on OS X Tiger was problematical, because the versions of
python, libxml2 and libxslt provided with Tiger were pretty much all
older than lxml supported; I built python from macports, including
current versions of libxml2, libxslt and lxml, and everything worked
with no problems. (I later stopped working with this on the Mac
because I need cx_Oracle as well, which doesn't exist for intel macs).
On Leopard, Python is up to date, but libxml/libxslt seems a bit
behind for lxml 2.0.x (no schematron support being the obvious
problem). I went back to the 1.3.6 source tarball (which is what I'm
using everywhere anyway), and "python setup.py install" worked like a
charm. (So it looks the easyinstall dependency is gone).
Of course, the real issue here is that, while Python may come with
"batteries included" you only get the common sizes like A, C and D. If
you need a B cell, you're on your own. In XML land, validation is one
such case. Me, I'd like complete xpath support, and xslt as well. But
this happens with other subsystems, like doing client-side SSL support,
but not server-side (at least, not as of 2.4; I haven't checked 2.5).
If you just want an xml module in the standard library that's more
complete, I'd vote for the source distribution of lxml, as that's C +
Python and built on top of commonly available libraries. The real
issue would be making current lxml work with the "outdated" versions
of those libraries found in current OS distributions.
In article <20080302230708.260fa4a9@bhuda.mired.org>,
Mike Meyer
lxml is good but not written in python and difficult to install and didn't work on MacOS X. lxml is built on top of libxml2/libxslt, which are bundled with most Unix-like OS's (including Mac OS X), or available in their package systems. Trying to install it from the repository is a PITA, because it uses both the easyinstall and Pyrex (later Cython) packages - which aren't bundled with anything. On the other hand, if it's in the
On Thu, 28 Feb 2008 23:42:49 +0000 (UTC) Medhat Gayed
wrote: package system (I no longer have macports installed anywhere, but believe it was there at one time), that solves all those problems. I believe they've excised the easyinstall source dependencies, though. [...] If you just want an xml module in the standard library that's more complete, I'd vote for the source distribution of lxml, as that's C + Python and built on top of commonly available libraries. The real issue would be making current lxml work with the "outdated" versions of those libraries found in current OS distributions.
I'm not sure what you perceive to be the problems with easy_install on OSX; I find it makes life *much* simpler for managing python packages. Be that as it may, since the release of lxml 2.0, the project has updated the lxml website with useful information about source installations and, in particular, OSX source installations: http://codespeak.net/lxml/build.html IIRC, here's what worked for me on Leopard (10.5.2) using the python.org 2.5.2, though it should work fine with the Apple-supplied 2.5.1: 1. Download and build source tarballs of recent libxml2 (at the moment 2.6.31 is the latest, OSX 10.5.2 has 2.6.16) and libxlst (1.1.22 vs 1.1.12) from xmlsoft.org and then install them to /usr/local/. (Don't bother with the python bindings unless they're needed for some other package.) 2. As noted on the lxml source page, Cython is not needed to install lxml and can complicate matters so, as suggested, if you have Cython (or Pyrex, for that matter) installed in the Python path you're going to install to, temporarily remove it(/them). 3. Using the lxml 2.0.x source tarball: /path/to/python setup.py install \ --with-xslt-config=/usr/local/bin/xslt-config 4. Verify installation: $ which python /usr/local/bin/python $ python Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree print etree.LXML_VERSION (2, 0, 2, 0) print etree.LIBXML_VERSION (2, 6, 31) print etree.LIBXML_COMPILED_VERSION (2, 6, 31) print etree.LIBXSLT_VERSION (1, 1, 22) print etree.LIBXSLT_COMPILED_VERSION (1, 1, 22)
Clearly there are other ways to do this but HTH. -- Ned Deily, nad@acm.org
On Tue, 04 Mar 2008 15:44:32 -0800 Ned Deily
In article <20080302230708.260fa4a9@bhuda.mired.org>, Mike Meyer
wrote: lxml is good but not written in python and difficult to install and didn't work on MacOS X. lxml is built on top of libxml2/libxslt, which are bundled with most Unix-like OS's (including Mac OS X), or available in their package systems. Trying to install it from the repository is a PITA, because it uses both the easyinstall and Pyrex (later Cython) packages - which aren't bundled with anything. On the other hand, if it's in the
On Thu, 28 Feb 2008 23:42:49 +0000 (UTC) Medhat Gayed
wrote: package system (I no longer have macports installed anywhere, but believe it was there at one time), that solves all those problems. I believe they've excised the easyinstall source dependencies, though. [...] If you just want an xml module in the standard library that's more complete, I'd vote for the source distribution of lxml, as that's C + Python and built on top of commonly available libraries. The real issue would be making current lxml work with the "outdated" versions of those libraries found in current OS distributions. I'm not sure what you perceive to be the problems with easy_install on OSX; I find it makes life *much* simpler for managing python packages.
I don't, but the real issue is that it's been considered - and rejected - for inclusion in the standard library multiple times. The OPs request was for a validating XML parser in the standard library. Any third party code that requires easy_install won't be acceptable. I think lxml is the best Python XML library that meets his requirements, and it would make my life a lot easier if it were part of the standard library. However, the authors tend to require recent versions of libxml2 and libxslt, which means recent versions of lxml won't build and/or work with the libraries bundled with many Unix and Unix-like systems - including OSX. Which means you wind up having to build those yourself if you want a recent version of lxml, even if you're using a system that includes lxml in it's package system.
Be that as it may, since the release of lxml 2.0, the project has updated the lxml website with useful information about source installations and, in particular, OSX source installations:
http://codespeak.net/lxml/build.html
IIRC, here's what worked for me on Leopard (10.5.2) using the python.org 2.5.2, though it should work fine with the Apple-supplied 2.5.1:
This is similar to what I went through with 1.3.6 on Tiger, but I used
MacPorts. On Leopard, 1.3.6 builds out of the box. Just do "sudo
python setup.py install" and you're done. That's probably the easiest
way to get a validating xml parser on OS X at this time.
Mike Meyer wrote:
I think lxml is the best Python XML library that meets his requirements, and it would make my life a lot easier if it were part of the standard library.
+1 (!) -- Bob Kline http://www.rksystems.com mailto:bkline@rksystems.com
(weird places these threads come up at, but now that it's here...) Mike Meyer wrote:
On Tue, 04 Mar 2008 15:44:32 -0800 Ned Deily
wrote: In article <20080302230708.260fa4a9@bhuda.mired.org>, Mike Meyer
wrote: On Thu, 28 Feb 2008 23:42:49 +0000 (UTC) Medhat Gayed
wrote: lxml is good but not written in python and difficult to install and didn't work on MacOS X.
Due to a design problem in MacOS-X, not a problem in lxml. But it's not that hard to install either, as previous posts presented.
lxml is built on top of libxml2/libxslt, which are bundled with most Unix-like OS's (including Mac OS X), or available in their package systems. Trying to install it from the repository is a PITA, because it uses both the easyinstall and Pyrex (later Cython) packages
Using a release version of lxml does not require you to install any of the two. Just download the tar.gz, unpack it, and do the usual setup.py dance. That's how package installation in Python works. It does, however, require you to have libxml2 and libxslt installed with their dependencies, but that has nothing to do with Python.
aren't bundled with anything. On the other hand, if it's in the package system (I no longer have macports installed anywhere, but believe it was there at one time), that solves all those problems. I believe they've excised the easyinstall source dependencies, though.
There is no source dependency on easy_install. I assume that all they did is: build lxml against the libxml2/libxslt libraries that come with macports. Which is a sensible thing to do IMHO.
I think lxml is the best Python XML library that meets his requirements, and it would make my life a lot easier if it were part of the standard library.
I don't object to that. I'm just not a major driver here as it would require a bit of work that I can't currently spare.
However, the authors tend to require recent versions of libxml2 and libxslt, which means recent versions of lxml won't build and/or work with the libraries bundled with many Unix and Unix-like systems
I wouldn't consider a dependency on an almost three year old library version "recent", libxml2 2.6.20 was released in July 2005.
- including OSX.
That's different, because the system libraries here are a) horribly outdated for every new version of MacOS-X (i.e. usually more than two years old and very buggy), and b) difficult to replace by design.
Which means you wind up having to build those yourself if you want a recent version of lxml, even if you're using a system that includes lxml in it's package system.
If you want a clean system, e.g. for production use, buildout has proven to be a good idea. And we also provide pretty good instructions on our web page on how to install lxml on MacOS-X and what to take care of. Stefan
On Tue, 11 Mar 2008 14:55:04 +0100 Stefan Behnel
(weird places these threads come up at, but now that it's here...) Mike Meyer wrote:
On Tue, 04 Mar 2008 15:44:32 -0800 Ned Deily
wrote: In article <20080302230708.260fa4a9@bhuda.mired.org>, Mike Meyer
wrote: On Thu, 28 Feb 2008 23:42:49 +0000 (UTC) Medhat Gayed
wrote: lxml is good but not written in python and difficult to install and didn't work on MacOS X.
Please note that this original complaint is *not* mine. However...
Due to a design problem in MacOS-X, not a problem in lxml.
I didn't find it noticeably harder to install lxml on MacOS-X than most other systems.
But it's not that hard to install either, as previous posts presented.
Depends on how you define "hard". If I have to create a custom environment with updated version of system libraries just to use lxml, I'd call that "hard". That was pretty much the only route available the first time I wanted lxml on OS-X. And ubuntu. And RHEL. The second time for OS-X, I used an older version of lxml (1.3.6), and just did "setup.py install". Worked like a charm. That's not hard. The only system that installing a modern version of lxml on was easy was FreeBSD, probably because libxml2 and libxslt aren't part of the system software.
However, the authors tend to require recent versions of libxml2 and libxslt, which means recent versions of lxml won't build and/or work with the libraries bundled with many Unix and Unix-like systems I wouldn't consider a dependency on an almost three year old library version "recent", libxml2 2.6.20 was released in July 2005.
Well, if you're on a development box that you update regularly, you're right: three years old is pretty old. If you're talking about a production box that you don't touch unless you absolutely have to, you're wrong: three years old is still pretty recent. For example, the most recent release of RHEL is 4.6, which ships with libxml2 2.6.16.
Which means you wind up having to build those yourself if you want a recent version of lxml, even if you're using a system that includes lxml in it's package system. If you want a clean system, e.g. for production use, buildout has proven to be a good idea. And we also provide pretty good instructions on our web page on how to install lxml on MacOS-X and what to take care of.
Yes, but the proposal was to include it in the Python standard
library. Software that doesn't work on popular target platforms
without updating a standard system library isn't really suitable for
that.
Mike Meyer wrote:
On Tue, 11 Mar 2008 14:55:04 +0100 Stefan Behnel
wrote: (weird places these threads come up at, but now that it's here...) Mike Meyer wrote:
On Tue, 04 Mar 2008 15:44:32 -0800 Ned Deily
wrote: In article <20080302230708.260fa4a9@bhuda.mired.org>, Mike Meyer
wrote: On Thu, 28 Feb 2008 23:42:49 +0000 (UTC) Medhat Gayed
wrote: lxml is good but not written in python and difficult to install and didn't work on MacOS X.
Please note that this original complaint is *not* mine. However...
Due to a design problem in MacOS-X, not a problem in lxml.
I didn't find it noticeably harder to install lxml on MacOS-X than most other systems.
It seems to be for a number of people, though, who turn up on the mailing list complaining about just that.
But it's not that hard to install either, as previous posts presented.
Depends on how you define "hard". If I have to create a custom environment with updated version of system libraries just to use lxml, I'd call that "hard". That was pretty much the only route available the first time I wanted lxml on OS-X. And ubuntu. And RHEL.
It got a lot better by now. The only problem is how to tell your operating system where to look for libraries that you installed yourself (which I still refuse to consider a problem of lxml). BTW, we had MacOS builds a while ago, so I wouldn't mind having someone volunteer to contribute builds on a regular basis (static builds preferred).
The second time for OS-X, I used an older version of lxml (1.3.6), and just did "setup.py install". Worked like a charm. That's not hard.
Interesting. 1.3.6 should also require libxml2 2.6.20 (although maybe less strictly than 2.0).
However, the authors tend to require recent versions of libxml2 and libxslt, which means recent versions of lxml won't build and/or work with the libraries bundled with many Unix and Unix-like systems I wouldn't consider a dependency on an almost three year old library version "recent", libxml2 2.6.20 was released in July 2005.
Well, if you're on a development box that you update regularly, you're right: three years old is pretty old. If you're talking about a production box that you don't touch unless you absolutely have to, you're wrong: three years old is still pretty recent. For example, the most recent release of RHEL is 4.6, which ships with libxml2 2.6.16.
Ok, that's pretty old, although that was the last version we supported before requiring 2.6.20 (last summer, somewhere in the 1.3 series). Anyway, it's definitely less of a problem to upgrade system libraries on Linux (IIRC "rpm -bs" helps you on older RH*L versions) than under MacOS.
Which means you wind up having to build those yourself if you want a recent version of lxml, even if you're using a system that includes lxml in it's package system. If you want a clean system, e.g. for production use, buildout has proven to be a good idea. And we also provide pretty good instructions on our web page on how to install lxml on MacOS-X and what to take care of.
Yes, but the proposal was to include it in the Python standard library. Software that doesn't work on popular target platforms without updating a standard system library isn't really suitable for that.
Hmm, coming somewhat back on-topic: how does Python currently handle its dependencies under MacOS-X? SQLite, for example? Does it use system libraries only, or are there libraries it ships with? (The MacOS distro is much bigger, but that might be due to the universal build - although that suggests that MacOS-X users do not care about disk space or download size anyway) It looks like it already ships with expat on all platforms, so why not add libxml2/libxslt to the distribution, at least on platforms where it's necessary? (happily ignoring the fact here that lxml isn't currently even close to being integrated) Admittedly, that would add some 1.3MB (uncompressed) to the distro... Regarding updated version requirements: those are always discussed on the mailing list. The only reason we had for continued support of 2.6.16 was MacOS-X 10.4, until we found that most users installed newer libraries anyway, because the old ones were just too old and crash-prone. Stefan
On 11 Mar, 2008, at 18:01, Stefan Behnel wrote:
Mike Meyer wrote:
On Tue, 11 Mar 2008 14:55:04 +0100 Stefan Behnel
wrote: (weird places these threads come up at, but now that it's here...) Mike Meyer wrote:
On Tue, 04 Mar 2008 15:44:32 -0800 Ned Deily
wrote: In article <20080302230708.260fa4a9@bhuda.mired.org>, Mike Meyer
wrote: On Thu, 28 Feb 2008 23:42:49 +0000 (UTC) Medhat Gayed
wrote: > lxml is good but not written in python and difficult to > install and didn't > work on MacOS X. Please note that this original complaint is *not* mine. However...
Due to a design problem in MacOS-X, not a problem in lxml.
I didn't find it noticeably harder to install lxml on MacOS-X than most other systems.
It seems to be for a number of people, though, who turn up on the mailing list complaining about just that.
What can make life a bit harder on OSX is universal binaries, although those aren't too hard either. BTW. Which design problem? BTW2. Discusion of problems with building lxml on OSX are better suited for the pythonmac-sig list (or the lxml one of course).
Yes, but the proposal was to include it in the Python standard library. Software that doesn't work on popular target platforms without updating a standard system library isn't really suitable for that.
Hmm, coming somewhat back on-topic: how does Python currently handle its dependencies under MacOS-X? SQLite, for example? Does it use system libraries only, or are there libraries it ships with? (The MacOS distro is much bigger, but that might be due to the universal build - although that suggests that MacOS-X users do not care about disk space or download size anyway)
The .dmg on python.org includes it's own copies of sqlite, ncurses and berkeley db. That's mostly needed to be able to run on 10.3.9 or later. My guess is that the size difference with other binary distributions is mostly due to universal binaries, those double the size of executables. This might get worse in the future, I hope to find some time go make the python framework 4-way universal (32-bit and 64-bit code on PPC and Intel). Ronald
Mike Meyer wrote:
Trying to install it from the repository is a PITA, because it uses both the easyinstall and Pyrex
It shouldn't depend on Pyrex as long as it's distributed with the generated C files. If it's not, that's an oversight on the part of the distributor. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+
On Wed, 05 Mar 2008 13:01:14 +1300 Greg Ewing
Mike Meyer wrote:
Trying to install it from the repository is a PITA, because it uses both the easyinstall and Pyrex
It shouldn't depend on Pyrex as long as it's distributed with the generated C files. If it's not, that's an oversight on the part of the distributor.
Sorry I wasn't clear. "from the repository" means building from
sourced checked out of the source repository, not from a
distribution.
Mike Meyer wrote:
On Wed, 05 Mar 2008 13:01:14 +1300 Greg Ewing
wrote: Mike Meyer wrote:
Trying to install it from the repository is a PITA, because it uses both the easyinstall and Pyrex It shouldn't depend on Pyrex as long as it's distributed with the generated C files. If it's not, that's an oversight on the part of the distributor.
Sorry I wasn't clear. "from the repository" means building from sourced checked out of the source repository, not from a distribution.
Ok, uhm, what do you think we do releases for? :) Stefan
On Thu, Feb 28, 2008, Medhat Gayed wrote:
I tested and tried a few XML validators but none of them is able to successfully validate a string of xml (not a file just a string) to programatically be able to validate messages of xml that flow in and out of the different systems. Teh validators I used were XSV, oNVDL and lxml, can we implement a pure python module for validating strings of xml using XML Schema (not DTD).
We certainly "can", for values of "we" that include "you". ;-) IOW, please write it yourself and post it to PyPI. Or find someone else to do the work, but in any event, python-dev is not an appropriate place to discuss it. Try comp.lang.python, perhaps, or a Python/XML mailing list. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "All problems in computer science can be solved by another level of indirection." --Butler Lampson
participants (8)
-
Aahz
-
Bob Kline
-
Greg Ewing
-
Medhat Gayed
-
Mike Meyer
-
Ned Deily
-
Ronald Oussoren
-
Stefan Behnel