A while ago (around Easter time), it was said that the Python documentation (manual reference, etc) would at some point be converted to SGML. The current LaTeX is, frankly, rather useless compared to say the Perl POD stuff if one wants to produce anything other than paper output or LaTeX2HTML HTML (which results in a rather clumsy in the Python manual context in my opinion, although that's not really the point), so this seemed like a very good idea. I would welcome the change to SGML as it would allow various people to generate the documentation in a more suitable format for their needs (be it a non-HTML format, or just to customise it for their own personal preferences). Around the same time, I posted a version of the manuals in a new HTML format using a quick PythonManualLaTeX -> x converter (where there were two 'x's: one was HTML, the other was StrongHelp which is a help system on my platform. Imagine the winhelp thing, without the speed losses, bugs, and with a touch more power (not a great deal) then you've got StrongHelp) I put together. The "parser" (so to speak) was a fairly hideous hack, which was rather reliant on the LaTeX format staying the same: needless to say, it changed quite substantially about 2 weeks afterwards :) I had a quick look at it the other day, and it's not going to work on the new LaTeX without a rewrite of the base converter (not surprising, given the time I had to do the thing). I suggested at the time that something along the lines of what I had written could be used as an SGML converter, as I don't know of there being any "general" software out there which would generate suitable SGML (ie with all the information about modules/methods still present). So I was wondering if any progress has been made with converting the LaTeX source to SGML, which would be the "manual source" of the future, so to speak? Is it now tied in with the SGMLTools stuff posted on this sig a couple of weeks back? Is it paused/stopped/waiting for extra help? Have I been stupid, and it's on www.python.org already? :) Basically, I would imagine I'm not the only person who's keen to see the SGML documentation at some point, so if anyone could let me - and presumably some other people on this sig who are also interested - know what the expected time schedule is, I would be grateful. Laurie
tratt@dcs.kcl.ac.uk said:
Is it now tied in with the SGMLTools stuff posted on this sig a couple of weeks back?
No, and I don't think it belongs there for two reasons: first, I think that generating DocBook (which SGMLtools targets) is overkill; people shouldn't be forced to have to install Jade et al just to get usable documentation; and second, like POD, a Python doc generator should be part of the base Python distribution so that people will actually start using it. As I'm talking anyway, here's my two cents of ideas (I shouldn't really be doing this, because I've been here too short and I haven't had the time to check the archives nor the code that's relevant...): I'd favour XML generation with a simple XML DTD (containing element definitions for boldface, italic, code, ordered and unordered list, basically HTML--). Include with Python a simple XML backend that can generate HTML, manpage source, ASCII and DocBook SGML - the latter one can then be used for the heavier stuff, like collecting documentation of numerous modules into one volume (eg., generate the library reference from the .py modules plus some SGML glue). Regards Cees -- Cees de Groot http://pobox.com/~cg <cg@pobox.com> http://www.sgmltools.org <cg@sgmltools.org> --- We're hiring Java developers => www.acriter.com
Cees de Groot writes:
check the archives nor the code that's relevant...): I'd favour XML generation with a simple XML DTD (containing element definitions for boldface, italic, code, ordered and unordered list, basically HTML--). Include with Python a
Combining the "book" style documents with docstring-based documentation is probably a non-starter; I'm not at all convinced that it would be any more useful than the current situation. Good docstrings and good reference documentation are not necessarily related. Docstrings should remain a very simple, text-only format, whereas I expect much more from the book-style manuals. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191
fdrake@cnri.reston.va.us said:
Good docstrings and good reference documentation are not necessarily related.
That, of course, depends on how far you want to go with literate programming aspects in docstrings - more a voting issue than anything else. I vote for the inclusion of reference documentation in the sources, on a level comparable with Javadoc or POD (more the former than the latter, though). By adopting a "first-sentence" convention like Javadoc, docstrings can still be used for a compact on-line reference. The reason: I've never seen a project where separate source and documentation files where kept in synch. -- Cees de Groot http://pobox.com/~cg <cg@pobox.com> --- We're hiring Java developers => www.acriter.com
Laurence Tratt writes:
A while ago (around Easter time), it was said that the Python documentation (manual reference, etc) would at some point be converted to SGML. The current LaTeX is, frankly, rather useless compared to say the Perl POD stuff if one wants to produce anything other than paper output or LaTeX2HTML HTML (which results in a rather clumsy in the Python manual
This is still planned. The preliminary conversion script I'd been working on has been massively broken due to the changes in the LaTeX markup, and I think the breakage is permanent: at this point, I'm more likely to start a conversion script from scratch than try to revive the old one yet again. On the other hand, the LaTeX markup has become much more logical, which reduces the immediacy of the need for a conversion. While I still think SGML/XML will be the final form of the documentation, I don't see a compelling need for a conversion at this time. No matter how we do things, there is no trivial conversion of the documentation and related tools that gives us any benefits over the current situation that I'm aware of; feel free to enlighten me on this one.
So I was wondering if any progress has been made with converting the LaTeX source to SGML, which would be the "manual source" of the future, so to
Like I said, I had a script that did a substantial portion of the conversion from the documentation of almost a year ago to a simple DTD, but it's pretty been relegated to the great bit bucket in the sky. There is no relation of the recent SGML-Tools conversation and the Python documentation, though I'd definately be interested in using SGML-Tools 1.1+ (DocBook & DSSSL) if we use the DocBook DTD.
Basically, I would imagine I'm not the only person who's keen to see the SGML documentation at some point, so if anyone could let me - and
You're not the only one. I'm just way to busy to devote any additional time myself at this point. If anyone is interested in converting the documentation to DocBook 3.x or DocBook-XML 1.0, I'd like to see a "conversion plan" which explains how you intend to map the LaTeX markup to *ML markup. I'd also be very interested in seeing alternate DTDs that are targeted more tightly for these sorts of documents, possibly specific to this particular documentation set. I am very concerned about actually authoring in DocBook. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191
In message <13813.39671.3295.188971@weyr.cnri.reston.va.us> "Fred L. Drake" <fdrake@cnri.reston.va.us> wrote:
A while ago (around Easter time), it was said that the Python documentation (manual reference, etc) would at some point be converted to SGML. This is still planned. The preliminary conversion script I'd been working on has been massively broken due to the changes in the LaTeX markup, and I think the breakage is permanent: at this point, I'm more likely to start a conversion script from scratch than try to revive the old one yet again.
This is basically what happened to my LaTeX -> x converter...
On the other hand, the LaTeX markup has become much more logical, which reduces the immediacy of the need for a conversion. While I still think SGML/XML will be the final form of the documentation, I don't see a compelling need for a conversion at this time. No matter how we do things, there is no trivial conversion of the documentation and related tools that gives us any benefits over the current situation that I'm aware of; feel free to enlighten me on this one.
My personal opinion is that the format that the documentation is written in is irrelevant: whatever is decided upon needs to have a converter written for it which can then be used to convert into varying types of documentation. My only problem with LaTeX is that it's pretty hard to write a reliable parser for, and that at the push of a button any program which relies on parts of the markup staying roughly the same dies horribly. If there could be a guarantee that the LaTeX markup wasn't going to change then I think it would be OK to use the LaTeX as the base converter. However - from experience - the temptation with LaTeX is always to have a quick fiddle to tidy things up, and that breaks things very easily :)
I'd also be very interested in seeing alternate DTDs that are targeted more tightly for these sorts of documents, possibly specific to this particular documentation set.
I would imagine this is the only way to reliably retain all the information that the LaTeX now holds (eg about when "string" is a module, and when it's an attribute in an RE object). I don't believe in reinventing the wheel when it's not necessary, but I would imagine we would be trying to mangle two different wheels together if we used a non-specific DTD. Laurie
Laurence Tratt writes:
only problem with LaTeX is that it's pretty hard to write a reliable parser for, and that at the push of a button any program which relies on parts of
Yes, this is a real problem. Perhaps there should be a LaTeXParser class available for Python, similar to sgmllib.SGMLParser? ;-) I don't know when I'll fit that in, but I'll probably have to whenever I get around to the next shot at a conversion script!
the markup staying roughly the same dies horribly. If there could be a guarantee that the LaTeX markup wasn't going to change then I think it would be OK to use the LaTeX as the base converter. However - from experience - the temptation with LaTeX is always to have a quick fiddle to tidy things up, and that breaks things very easily :)
I don't know that this will change substantially if we use a Python- specific DTD. There are a couple of issues here: - Requirements change, even if slowly. The most substantial recent changes for the Python docs have been the introduction of the API document and the howto document class. - When new markup structures are identified which better serve existing requirements. For example, we're now coding the short "module synopsis" provided in chapter introductions in the module section, which improves the maintainability and flexibility of the documents.
I would imagine this is the only way to reliably retain all the information that the LaTeX now holds (eg about when "string" is a module, and when it's
I think the DocBook ROLE attribute could be used to help make these distinctions, but that brings back up the authoring issue again: who's going to type all that stuff? For the contributed documentation, we still receive more sections that use \code for everything that should be typeset in Courier than anything else (though some authors are using the "logical" markup; thanks!). What happens is that I have to spend a fair amount of time adjusting the markup to us \module, \function, \method, or whatever is called for. This is time consuming and prevents me from spending much time writing (as do other things, like needing to get things done!). Perhaps it's time to dust off the DTD I was writing as the target for the old conversion script and bring that somewhat up to date and add comments on the semantic interpretation of the elements and attributes, perhaps with notes on processing expectations for the current output formats. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191
In message <13814.40453.748852.965008@weyr.cnri.reston.va.us> "Fred L. Drake" <fdrake@cnri.reston.va.us> wrote:
only problem with LaTeX is that it's pretty hard to write a reliable parser for Yes, this is a real problem. Perhaps there should be a LaTeXParser class available for Python, similar to sgmllib.SGMLParser? ;-) I don't know when I'll fit that in
Quit your day job :)
but I'll probably have to whenever I get around to the next shot at a conversion script!
I don't think one needs to make a general parser, and writing a hard-coded-to- one-style parser is definitely a lot easier. But that's what I mean about breaking: a hard-coded parser is totally at the mercy of the LaTeX style staying the same.
I don't know that this will change substantially if we use a Python- specific DTD. There are a couple of issues here:
- Requirements change, even if slowly. The most substantial recent changes for the Python docs have been the introduction of the API document and the howto document class. - When new markup structures are identified which better serve existing requirements. For example, we're now coding the short "module synopsis" provided in chapter introductions in the module section, which improves the maintainability and flexibility of the documents.
It's a bit easier to update things in a conversion program with a more, er, strict layout like XML than LaTeX: with LaTeX, if you write a conversion script, you pretty much have to stipulate that if the style changes, the program gets rewritten. With XML/SGML/whatever, you have a fighting chance :) Personally I'm not a great fan of XML, but I would like to be able to generate different types of output from the documentation - which contains all the information I need, but in a very hard to digest format. Something like XML won't really solve any problems for the people writing the documentation as far as I can see, but it will solve the output problem. Does this sound like a fair comment?
Perhaps it's time to dust off the DTD I was writing as the target for the old conversion script and bring that somewhat up to date and add comments on the semantic interpretation of the elements and attributes, perhaps with notes on processing expectations for the current output formats.
That sounds like a good idea. Laurie
tratt@dcs.kcl.ac.uk said:
I would imagine this is the only way to reliably retain all the information that the LaTeX now holds (eg about when "string" is a module, and when it's an attribute in an RE object). I don't believe in reinventing the wheel when it's not necessary, but I would imagine we would be trying to mangle two different wheels together if we used a non-specific DTD.
Fred made a statement about energy better spent at writing documentation than converting it - this becomes very true in the area of DTDs. A DTD like DocBook isn't non-specific, it is very specific: writing software documentation, and with it are all the tools to adapt it to your own piece of software (starting with very simple things like role attributes). With inventing your own DTD you'll have little gains, but a big burden: endless discussions on tags, the need to maintain it, the need to write conversion software, etcetera. I'd advise anything short of a full-time SGML publishing business against doing so. The <SystemItem role="module">string</> module. The <StructField>string</> attribute of an RE object. As I mentioned, with the decision to use DocBook, you'd need to make a sort of stylesheet where you define the markup to apply to the common elements of the software being documented; in the case of Python, this list is probably something like [module, class, function, parameter, default value of parameter, attribute], in other words: not very long and way easier to maintain than a complete DTD. -- Cees de Groot http://pobox.com/~cg <cg@pobox.com> --- We're hiring Java developers => www.acriter.com
participants (3)
-
cg@pobox.com -
Fred L. Drake -
Laurence Tratt