Last week I promised on the Python list to describe the current status of the conversion to SGML/XML. Here it is! I'm currently able to parse all the LaTeX markup and generate either XML or SGML. The structure of the output is very similar to the input structure, but a number of minor improvements are made. The improvements are mostly very localized and have more to do with keeping the markup from getting to complex and disjointed, and eliminating some bogosities. I am not at all decided on a DTD to use. I see three options: 1. DocBook -- this has been developed and heavily use-tested by a number of corporate users, and is known to be good for technical documentation. There are tools and stylesheets available to convert from DocBook to HTML and printed formats. We'd probably need to specialize it, but it's designed for that. Konrad Hinsen has already developed one customization that he's using to document Python modules, and there's an initiative to create a common extension for documenting OO constructs. I've asked Konrad for some sample documentation so I can see how it actually works out. My concern with DocBook is that the markup may be a bit on the "heavy" side; I don't want the document source to be so markup-heavy that I'm the only one to work on them. 2. Create something similar to what we had in LaTeX, but with fewer warts. This is appealing because the conversion would be done sooner. However, new stylesheets would be needed, slowing down the usefulness of the result. It would also be the easiest to adopt for people already familiar with the current markup. 3. Create something entirely new and specific to Python. Clearly, this offers a lot of work to all the volunteers. We'd need requirements analysis, DTD design, stylesheets, and probably lots of things I haven't thought of. However, it also means we can limit the weight of the markup in the source, which might be a major advantage in getting people to use it. But *everyone* would have to learn it (well, everyone that writes documentation at any rate). This offers a great deal of opportunity to "get it right" for Python, but also a lot of rope. (You know what rope is used for, right?) I'd like to see some discussion on what should be done and what needs to be done. From where I sit, the most important thing is to make sure we can maintain a high level of semantic markup (hopefully making further improvements over what we already have), with generation of hypertext (HTML, info, whatever) being the next most important thing. Typeset documents are a requirement, but aren't as high up the list. I'm not terribly concerned about how XML/SGML-->foo conversion processes are implemented, with the caveat being that I need to be able to understand them without a massive learning curve. Clearly, Python code is a major option for tools (surprised?), but I can easily deal with using Java tools (with or without JPython), DSSSL processors (just don't expect me to maintain Jade/OpenJade!), XSL, CSS, and whatnot. I'd like to get away from having any Perl scripts involved, not because I think Perl is Evil, but because I'm not a Perl hacker. (Don't get me wrong; I make no claim that Perl is not Evil! ;) Comments, suggestions, volunteers? -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives