Are you sure that we should choose expat as "native" XML parser ?
It wouldn't necessarily be the only parser. To process XML, different applications have different needs. However, since the expatreader is the only SAX reader included in the standard library at the moment, guaranteeing presence of pyexpat is oft-requested. Notice that pyexpat.c is also in the standard library already.
There are other candidates which would fit this role just as well (in particular, Fredrik's sgmlop looks like a nice extension since it not only works with XML but also many other meta languages).
Not that many candidates would work as well. For example, sgmlop has a number of known bugs, and a few unknown ones. Guido once complained that it is easy to crash sgmlop with ill-formed input, and rejected inclusion of sgmlop when xmlrpclib was integrated. A known problem is that entity references are not expanded in attributes. Beyond that, I'm not aware of many more pure-C parsers that could be reasonably be integrated into the core. There are many XML parsers, but many of the are written in C++ or Java.
If you want a very fast validating XML parser, RXP would also be a good choice -- AFAIK, the RXP folks would allow us to ship RXP under a different license than GPL which is then bound to Python.
RXP would indeed be a choice. Of course, integrating it is much harder; you'd have to write the C module first, plus documentation, plus a SAX driver, plus test cases. I'm not sure how much code you can inherit from PyLTXML. On performance: Please have a look at http://www.xml.com/lpt/a/Benchmark/exec.html which suggests that expat still has a speed advantage over rxp (assuming that the measurements where done carefully, i.e. disabling validation in RXP).
Given the many alternatives, I am not sure whether going with expat is the right path... may be wrong though.
It shouldn't be the only path. pyexpat is already integrated into the Python library, all I'm suggesting to give the promise that it will be available on every 2.2 Python installation. Any volunteers working on RXP integration are certainly welcome to do so; code contributions to PyXML will be welcome (provided the GPL issue gets resolved). Code contributions to the Python core would require some review, of course - it took quite some time to get pyexpat stable, and I guess any other C-integrated parser won't work from scratch, either. Regards, Martin