[lxml-dev] Call for contribution towards lxml 1.3

Hi all, lxml 1.3 is nearing completion. There were some major changes under the hood, but the most visible part of the new release is actually the new layout of the documentation site, which should make it much more accessible. As usual, the preview is here: http://codespeak.net/lxml/dev/ Some of you have mentioned their impression that it's hard to help out on lxml as it's written in Pyrex, not Python. Although the current code looks very C-ish in many places, this is more of a performance optimisation than a real requirement. Pyrex actually makes it possible to work on the code in a very Python-like style, and to make the C-ification a matter of later improvement. So Python(-like) implementations of new features are definitely welcome. A non-optimised implementation of an interesting feature is much better than the lack of this feature would be. So, everyone is invited to get involved in making the code even better than it is today. But there is another area where help is appreciated. A very important area in fact: *documentation*. While there is quite a bit of documentation both on ElementTree and lxml, there are certainly places where lxml's API and its way of doing XML are hard to access, especially for new users and those who have a fixed (should I say: Java-ish?) mindset on XML. If you want to contribute, helping out in this area is warmly appreciated. Here are a few ideas that would be truely helpful for lxml's user base. * I would love to see lxml's own tutorial that gets the main ideas and the most useful features across without caring too much about ElementTree (which already has a tutorial). * Some statistics: what /are/ the most useful features of lxml? What do people like or use most? What parts of lxml should be more accessible? Which parts are so well done that people grasp their usage immediately (and should therefore be promoted as an eye-catcher)? * We could benefit from a Wiki where users could contribute code examples, best practices, work-arounds or tool snippets. We should also start linking to external pages, blogs, presentations on lxml or ElementTree that others might find interesting. Obviously, this list is not complete, so if you want to contribute, I hope you will easily find places to do so. Please help us in making lxml 1.3 the best release ever - and the most accessible one! Have fun, Stefan

Hi there, Stefan Behnel wrote:
I think the lxml documentation project is a great initiative and I encourage everybody to join in! Besides the topics Stefan mentioned, I think we should consider creating complete API documentation for lxml looking similar to what's on www.python.org for the core library. I think this should include both the ElementTree API and the lxml extensions in one place. lxml extensions to the API should be marked in the docs. I think having a clear overview of the API will help people find and use the numerous somewhat hidden treasures that exist in lxml. So, API volunteers, you don't already need to be an expert on the lxml API. Writing a bit of API doc would be a good way to *become* an expert, though. I will be happy to help get any API docs volunteers on their way, so if you start this, you won't be on your own. I'm excited about this documentation project and I'm hoping we'll get a few great new contributors! Regards, Martijn

Martijn Faassen wrote:
Definitely. Docstrings are an important point here. They serve both for online-docs via help() and can be used to extract docs into other formats. I'm not aware of any doc-gen tools that reads Pyrex, though. While we could import the module and see what we get, we'd also need support for figuring out the signatures of methods and functions, which C-classes don't provide. Any ideas? Stefan

Hi Stefan, hi all,
lxml 1.3 is nearing completion. There were some major changes under the hood,
Is there a planned release date? Do you plan to get the xsi:type="xsd:<type>" thingie into 1.3? I'd love to have this in and I might be able to contribute if needed, but would have to know how much time left until the final 1.3 release; because I certainly will not be able to do so until the end of next week.
I do have kind of a tutorial introduction to lxml.objectify, but we tend to wrap some of the entry points into our custom API, and we use some extensions (namely datetime and decimal, so this does currently not match 1-to-1 to out-of-the-box objectify. As the official objectify documentation is kind of tutorial-like itself, maybe I could check where I could add enhancements to that. Regarding API documentation I vote for some reference doc that is actually generated from docstrings or source code documentation. What about pydoc?
For me, that's 1. standards-compliance by intelligently building on libxml2/libxslt 2. feature-richness: covers extremely convenient XML handling plus Schema/RelaxNG validation and XSLT 3. stability and maturity 4. extensibility 5. performance
A wiki would be nice. I really think lxml has the potential to be THE python XML toolkit. The only thing users might keep from it sometimes is the dependency on the massive libxml2, which can be addressed by a good build/dependency system. And, as said, building on libxml2 is of course also lxml's biggest advantage. Btw I for one don't like eggs; I like to package libraries in my platform package format. Anyone know about a tool to convert an egg to a Sun package? Keep up the superb work, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

jholg@gmx.de wrote: [snip useful thoughts]
Converting eggs themselves, I don't know. Distutils/setuptools is able however is pluggable and should have the information to build all kinds of package formats, including tarballs, eggs, and rpms. This would be the right area to look into to get native package support. In addition, the zc.buildout infrastructure that I experimented with in the past does provide nice ways to get a lxml set up which includes libxml2 and so on. Unfortunately it only makes sense if you develop the rest of your application as a buildout. zc.buildout is rumored to be growing support for RPM-based deployement and such, so that might be something else to explore. Regards, Martijn

Hi,
I happen to have a bdist_sunpkg distutils command class that does the job. Still waiting for my company to allow me to officially contribute that to Python, what with the agreement you have to sign these days. Until then, it's python patch item 1589266 ;-): https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1589266&group_id=5470 However, the current egg-shipped stuff using setuptools tends to clutter things with egg-related stuff I'd rather not want. Happened with lxml at least, I now have an unnecessary lxml-1.2.1-py2.4.egg-info directory that I can't seem to get rid of :-) While the egg thing might have maximum ease-of-use for a lot of people, this can be different if you are a) not on linux/win (I'm on sparc solaris) b) not directly connected to the web with your workstation And I for one do not like the easy_install notion of starting to transparently download stuff. Thanks for you info, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

On 2007-04-26 22:19:14 +0200, Stefan Behnel <stefan_ml@behnel.de> said:
The problem for me always was that the Pyrex required was some special version. And if you'd just checkout the code you couldn't compile it just like that. If there's a way to "fix" that (like with a buildout) I'd be very willing to do changes, even in Pyrex. Pyrex doesn't look too strange to me. :) -- Christian Zagrodnick gocept gmbh & co. kg · forsterstrasse 29 · 06112 halle/saale www.gocept.com · fon. +49 345 12298894 · fax. +49 345 12298891

Hi, Christian Zagrodnick wrote:
There are currently two ways to get a working Pyrex. One is to download the source distribution of lxml which includes Pyrex. The other is to "svn co" the Pyrex source from the lxml repository. See http://codespeak.net/lxml/dev/build.html#pyrex The Subversion URL is: http://codespeak.net/svn/lxml/pyrex/ Here, it's actually sufficient to checkout the "Pyrex" directory under the lxml source tree, i.e. svn co http://codespeak.net/svn/lxml/trunk lxml cd lxml svn co http://codespeak.net/svn/lxml/pyrex/Pyrex Pyrex That has the additional advantage that you can "svn up" both with a single comand. Another thing to document ... Stefan

Stefan Behnel writes:
You need to edit the svn:externals property so Pyrex gets updated as well. You can do the following. svn co http://codespeak.net/svn/lxml/trunk lxml svn ps svn:externals "Pyrex http://codespeak.net/svn/lxml/pyrex/Pyrex" lxml svn up lxml This we everything gets updated, when you do a "svn up". Maybe it makes sense to put svn:externals in trunk, since people who checkout from trunk need Pyrex anyway. Kind regards, Michael

Hi Michael, Michael Guntsche wrote:
I was always hoping we could get back to depending on a normal Pyrex release rather sooner than later, but I guess you're right. Since Greg doesn't follow a very open project management style, it's hard to predict when lxml will be able to build with an unpatched Pyrex release. I'll go with the above for now... Stefan

Hi there, Stefan Behnel wrote:
I think the lxml documentation project is a great initiative and I encourage everybody to join in! Besides the topics Stefan mentioned, I think we should consider creating complete API documentation for lxml looking similar to what's on www.python.org for the core library. I think this should include both the ElementTree API and the lxml extensions in one place. lxml extensions to the API should be marked in the docs. I think having a clear overview of the API will help people find and use the numerous somewhat hidden treasures that exist in lxml. So, API volunteers, you don't already need to be an expert on the lxml API. Writing a bit of API doc would be a good way to *become* an expert, though. I will be happy to help get any API docs volunteers on their way, so if you start this, you won't be on your own. I'm excited about this documentation project and I'm hoping we'll get a few great new contributors! Regards, Martijn

Martijn Faassen wrote:
Definitely. Docstrings are an important point here. They serve both for online-docs via help() and can be used to extract docs into other formats. I'm not aware of any doc-gen tools that reads Pyrex, though. While we could import the module and see what we get, we'd also need support for figuring out the signatures of methods and functions, which C-classes don't provide. Any ideas? Stefan

Hi Stefan, hi all,
lxml 1.3 is nearing completion. There were some major changes under the hood,
Is there a planned release date? Do you plan to get the xsi:type="xsd:<type>" thingie into 1.3? I'd love to have this in and I might be able to contribute if needed, but would have to know how much time left until the final 1.3 release; because I certainly will not be able to do so until the end of next week.
I do have kind of a tutorial introduction to lxml.objectify, but we tend to wrap some of the entry points into our custom API, and we use some extensions (namely datetime and decimal, so this does currently not match 1-to-1 to out-of-the-box objectify. As the official objectify documentation is kind of tutorial-like itself, maybe I could check where I could add enhancements to that. Regarding API documentation I vote for some reference doc that is actually generated from docstrings or source code documentation. What about pydoc?
For me, that's 1. standards-compliance by intelligently building on libxml2/libxslt 2. feature-richness: covers extremely convenient XML handling plus Schema/RelaxNG validation and XSLT 3. stability and maturity 4. extensibility 5. performance
A wiki would be nice. I really think lxml has the potential to be THE python XML toolkit. The only thing users might keep from it sometimes is the dependency on the massive libxml2, which can be addressed by a good build/dependency system. And, as said, building on libxml2 is of course also lxml's biggest advantage. Btw I for one don't like eggs; I like to package libraries in my platform package format. Anyone know about a tool to convert an egg to a Sun package? Keep up the superb work, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

jholg@gmx.de wrote: [snip useful thoughts]
Converting eggs themselves, I don't know. Distutils/setuptools is able however is pluggable and should have the information to build all kinds of package formats, including tarballs, eggs, and rpms. This would be the right area to look into to get native package support. In addition, the zc.buildout infrastructure that I experimented with in the past does provide nice ways to get a lxml set up which includes libxml2 and so on. Unfortunately it only makes sense if you develop the rest of your application as a buildout. zc.buildout is rumored to be growing support for RPM-based deployement and such, so that might be something else to explore. Regards, Martijn

Hi,
I happen to have a bdist_sunpkg distutils command class that does the job. Still waiting for my company to allow me to officially contribute that to Python, what with the agreement you have to sign these days. Until then, it's python patch item 1589266 ;-): https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1589266&group_id=5470 However, the current egg-shipped stuff using setuptools tends to clutter things with egg-related stuff I'd rather not want. Happened with lxml at least, I now have an unnecessary lxml-1.2.1-py2.4.egg-info directory that I can't seem to get rid of :-) While the egg thing might have maximum ease-of-use for a lot of people, this can be different if you are a) not on linux/win (I'm on sparc solaris) b) not directly connected to the web with your workstation And I for one do not like the easy_install notion of starting to transparently download stuff. Thanks for you info, Holger -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

On 2007-04-26 22:19:14 +0200, Stefan Behnel <stefan_ml@behnel.de> said:
The problem for me always was that the Pyrex required was some special version. And if you'd just checkout the code you couldn't compile it just like that. If there's a way to "fix" that (like with a buildout) I'd be very willing to do changes, even in Pyrex. Pyrex doesn't look too strange to me. :) -- Christian Zagrodnick gocept gmbh & co. kg · forsterstrasse 29 · 06112 halle/saale www.gocept.com · fon. +49 345 12298894 · fax. +49 345 12298891

Hi, Christian Zagrodnick wrote:
There are currently two ways to get a working Pyrex. One is to download the source distribution of lxml which includes Pyrex. The other is to "svn co" the Pyrex source from the lxml repository. See http://codespeak.net/lxml/dev/build.html#pyrex The Subversion URL is: http://codespeak.net/svn/lxml/pyrex/ Here, it's actually sufficient to checkout the "Pyrex" directory under the lxml source tree, i.e. svn co http://codespeak.net/svn/lxml/trunk lxml cd lxml svn co http://codespeak.net/svn/lxml/pyrex/Pyrex Pyrex That has the additional advantage that you can "svn up" both with a single comand. Another thing to document ... Stefan

Stefan Behnel writes:
You need to edit the svn:externals property so Pyrex gets updated as well. You can do the following. svn co http://codespeak.net/svn/lxml/trunk lxml svn ps svn:externals "Pyrex http://codespeak.net/svn/lxml/pyrex/Pyrex" lxml svn up lxml This we everything gets updated, when you do a "svn up". Maybe it makes sense to put svn:externals in trunk, since people who checkout from trunk need Pyrex anyway. Kind regards, Michael

Hi Michael, Michael Guntsche wrote:
I was always hoping we could get back to depending on a normal Pyrex release rather sooner than later, but I guess you're right. Since Greg doesn't follow a very open project management style, it's hard to predict when lxml will be able to build with an unpatched Pyrex release. I'll go with the above for now... Stefan
participants (5)
-
Christian Zagrodnick
-
jholg@gmx.de
-
Martijn Faassen
-
Michael Guntsche
-
Stefan Behnel