Re: [Doc-SIG] XML Conversion Update
While I agree with Sean (and others) that small DTDs are a lot better suited to documenting Python modules there's various standard-formatting things that you'd like to borrow from existing DTDs (emphasis, references to other manuals/sections, footnotes, etc). Is there a way that that could be done, without dragging in the whole of the (apparently huge and hairy, from the reports here) docbook DTD? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
At 23:16 29/08/99 +0200, you wrote:
While I agree with Sean (and others) that small DTDs are a lot better suited to documenting Python modules there's various standard-formatting things that you'd like to borrow from existing DTDs (emphasis, references to other manuals/sections, footnotes, etc).
Is there a way that that could be done, without dragging in the whole of the (apparently huge and hairy, from the reports here) docbook DTD?
I think we should grab some of the formatting things from the HTML tag soup - including a really simple table model. A key question I believe is the naming convention issue. This is key to document management and key to cross references. I believe we should strive for a semantic naming scheme for information objects. I propose a naming scheme based on what I dub "fully qualified information object identifiers". The idea is to use the hierarchical location of an information object in a document assembly to arrive at a meaninful and unique names e.g.: Library_Reference-Python_Services-UserList.xml API-Abstract_Objects_Layer-Mapping_Protocol.xml As well as acting as names for information object storage these are also names for xref purposes e.g.: See <xref idref="API-Abstract_Objects_Layer-Mapping_Protocol"> the mapping protocol</xref> for more information. I suggest we go with XML rather than SGML in the sense that anything checked in/out of the system is XML. People who know SGML will probably want to pepper in some tag minimization for their emacs setup:-) They can then use James Clarks SX for example to convert to XML. regards, <Sean URI="http://www.digitome.com/sean.html"> Developers Day Co-Chair, 9th International World Wide Web Conference 16-19, May, 2000, Amsterdam, The Netherlands http://www9.org </Sean>
Sean Mc Grath wrote:
I believe we should strive for a semantic naming scheme for information objects. I propose a naming scheme based on what I dub "fully qualified information object identifiers". The idea is to use the hierarchical location of an information object in a document assembly to arrive at a meaninful and unique names e.g.:
Library_Reference-Python_Services-UserList.xml API-Abstract_Objects_Layer-Mapping_Protocol.xml
Great but what about when UserList.xml moves -- all links break. Global names are more robust.
I suggest we go with XML rather than SGML in the sense that anything checked in/out of the system is XML. People who know SGML will probably want to pepper in some tag minimization for their emacs setup:-) They can then use James Clarks SX for example to convert to XML.
This presumes that the character representation of the text is irrelevant. This is emphatically NOT the case for the same reasons that it is not the case with Python. The first problem is that I will be very pissed off if I write in a particular style and then check my document in and get it back in a very different style. The second problem is that "diff" will report that every line has changed. That in turn messes up CVS. I prefer to operate on a hands-off basis. What you edit is what you check in is what is stored is what gets checked out is what you edit. The first time some SGML user messes this up I expect everyone will be rightly pissed off. This means that we need to make the simplified SGML vs. XML choice for real. We can't presume that everyone will do what they like. I could live with XML but I think that the cost of allowing shorttend <emph>end tags</> is pretty minor and can make a huge difference in type-ability. Con: this will break compatibility with some XML editors -- do we expect Python hackers to use sissified GUI editors?? :) Paul Prescod
Sean Mc Grath wrote:
I believe we should strive for a semantic naming scheme for information objects. I propose a naming scheme based on what I dub "fully qualified information object identifiers". The idea is to use the hierarchical location of an information object in a document assembly to arrive at a meaninful and unique names e.g.:
Library_Reference-Python_Services-UserList.xml API-Abstract_Objects_Layer-Mapping_Protocol.xml
[Paul Prescod]
Great but what about when UserList.xml moves -- all links break. Global names are more robust.
Sorry, a case of a very important detail that I did not flesh out owing to my time crunch! I mentioned in the first post that this micro-document architecture supports link management. My proposal is that when UserList.xml moves, a redirect stub is left behind. I.e. the file (using Guido's suggested CamelCasing) LibraryReference-PythonServices-UserList.xml is not deleted, but its contents are just something like: <redirect fqio="blah.xml"/> Where, blah.xml is the new location for the UserList material. (Periodically, all redirects can be then be expunged).
I suggest we go with XML rather than SGML in the sense that anything checked in/out of the system is XML. People who know SGML will probably want to pepper in some tag minimization for their emacs setup:-) They can then use James Clarks SX for example to convert to XML.
[Paul Prescod]
This presumes that the character representation of the text is irrelevant. This is emphatically NOT the case for the same reasons that it is not the case with Python. The first problem is that I will be very pissed off if I write in a particular style and then check my document in and get it back in a very different style. The second problem is that "diff" will report that every line has changed. That in turn messes up CVS.
I understand your points here but I still think we should go with plain vanilla XML as the storage notation. Even if we went with SGML, most SGML tools put inferred tags into your documents for you whether you like it or not!
I prefer to operate on a hands-off basis. What you edit is what you check in is what is stored is what gets checked out is what you edit.
The only SGML editor I know that allows you to work on a hands-off basis is emacs! Fully blown SGML editors like Adept, Author/Editor, Frame etc. all canonicalize the SGML as part of the read/edit/save round trip.
The first time some SGML user messes this up I expect everyone will be rightly pissed off. This means that we need to make the simplified SGML vs. XML choice for real. We can't presume that everyone will do what they like. I could live with XML but I think that the cost of allowing shorttend <emph>end tags</> is pretty minor and can make a huge difference in type-ability.
Con: this will break compatibility with some XML editors -- do we expect Python hackers to use sissified GUI editors?? :)
Frankly, yes. There are some cool XML editing tools beginning to appear. As part of the Pyxie project I have developed a servicable XML editor with wxPython. With a bit of work, it could be tailored to the documentation project to produce easy to use, fully Python based tools for editing/maintaining the Python docs. IBM have made available a Java app. which, given a DTD will spit out a validating, Java based XML editing app tailored to that DTD. Henry Thomsons XED is Python/Tk based and is getting very usable in my opinion. Corel's Wordperfect has a ridiculously good XML editing capability for such a cheap office suite product! Even if we went with SGML and people used Adept, Author/Editor, FrameMaker+SGML, whatever, the situation would be the same - tag minimization would be removed by the check-out/edit/check-in round trip. regards, <Sean URI="http://www.digitome.com/sean.html"> Developers Day Co-Chair, 9th International World Wide Web Conference 16-19, May, 2000, Amsterdam, The Netherlands http://www9.org </Sean>
[Paul Prescod]
Con: this will break compatibility with some XML editors -- do we expect Python hackers to use sissified GUI editors?? :)
[Sean McGrath]
Frankly, yes. There are some cool XML editing tools beginning to appear. As part of the Pyxie project I have developed a servicable XML editor with wxPython. With a bit of work, it could be tailored to the documentation project to produce easy to use, fully Python based tools for editing/maintaining the Python docs.
Let me second that. I refuse to write XML until there are real tools. And by that I mean that I don't want to type <stuff> or <stuff/>, &stuff; etc.. Emacs is a good code editor. It's a terrible document editor. I'll write code that *generates* and *processes* XML, of course, but I hate writing 'text' in emacs. IMHO, of course. --david
If I honestly believed that most of us were going to end up using XML editors, I would support using regular XML as a no-brainer. But I think that the average Python hacker is no more likely to download a specific, customized XML editor than they are to download and use IDLE in preference to their favorite text editor. I wrote my last book in vi(1) (admittedly an extreme choice) and the one before in Emacs (a little more reasonable). I expect this to be the norm but neither of us has a crystal ball. And if we DO use XML editors then we run into the "diff/CVS" problem. This is a MAJOR problem for an open source effort. Maybe we can find/create an XML-smart diff and integrate it with CVS. In thiscase I would't be so concerned...I would just unnormalize data I checked out and re-normalize it when I checked in.
I understand your points here but I still think we should go with plain vanilla XML as the storage notation. Even if we went with SGML, most SGML tools put inferred tags into your documents for you whether you like it or not!
That's why I don't use them.
The only SGML editor I know that allows you to work on a hands-off basis is emacs! Fully blown SGML editors like Adept, Author/Editor, Frame etc. all canonicalize the SGML as part of the read/edit/save round trip.
I think that XMetaL comes pretty close.It has a "raw text" mode that you can switch back and forth to. Some HTML editors (e.g. DreamWeaver) also have this concept so maybe hands-off editing will be a standard feature of XML editors in a few years. In the meantime I use whatever text editor I happen to have installed. Yes my knuckles also drag on the ground. Paul Prescod
The idea is to use the hierarchical location of an information object in a document assembly to arrive at a meaninful and unique names e.g.:
Library_Reference-Python_Services-UserList.xml API-Abstract_Objects_Layer-Mapping_Protocol.xml
As well as acting as names for information object storage these are also names for xref purposes e.g.:
See <xref idref="API-Abstract_Objects_Layer-Mapping_Protocol"> the mapping protocol</xref> for more information.
Without any context, this looks like a horrible idea in one detail: the mixing of underscores and hyphens that you propose. Anything, but not that! Make it CamelCase if you have to: LibraryReference-PythonServices-UserList.xml API-AbstractObjectsLayer-MappingProtocol.xml --Guido van Rossum (home page: http://www.python.org/~guido/)
As well as acting as names for information object storage these are also names for xref purposes e.g.:
See <xref idref="API-Abstract_Objects_Layer-Mapping_Protocol"> the mapping protocol</xref> for more information.
[Guido]
Without any context, this looks like a horrible idea in one detail: the mixing of underscores and hyphens that you propose. Anything, but not that! Make it CamelCase if you have to:
LibraryReference-PythonServices-UserList.xml API-AbstractObjectsLayer-MappingProtocol.xml
Yes, the dash/underscore soup is awful. CamelCasing is better. If the information objects live on a flat filesystem then the we need to restrict object names to a filesystem friendly subset. We probably want to eschew the likes of "&" for example. We can be more uninhibited if the information objects live in somelike like mySQL. The benefits of storage in a relational database probably only outweigh the drawbacks once the number of information objects gets very large though. regards, <Sean URI="http://www.digitome.com/sean.html"> Developers Day Co-Chair, 9th International World Wide Web Conference 16-19, May, 2000, Amsterdam, The Netherlands http://www9.org </Sean>
[Cleaning out my mailbox a bit...] Jack Jansen writes:
While I agree with Sean (and others) that small DTDs are a lot better suited to documenting Python modules there's various standard-formatting things that you'd like to borrow from existing DTDs (emphasis, references to other manuals/sections, footnotes, etc).
Is there a way that that could be done, without dragging in the whole of the (apparently huge and hairy, from the reports here) docbook DTD?
I looked at the idea of just using the names from DocBook when the semantics are the same, and decided that was a red herring. DocBook is *very* verbose, and using XML is already heavy enough in the markup department. Something simple, like: \emph{My} plan was to use \var{bogosity} with a value of \code{1}. becomes (in XML): <emphasis>My</emphasis> plan was to use <varname>bogosity</varname> with a value of <literal>1</literal>. or (in SGML): <emphasis>My</> plan was to use <varname>bogosity</> with a value of <literal>1</>. There are some (less important) issues of mapping \code, since it's used for things which I'm not at sure map to the same thing, and similar (but less importantly) for \samp. Any way around it, *ML markup is very heavy for the occaisional contributor using vi or emacs (or any other editor that isn't highly specialized). Even asking the emacs user to install PSGML is probably too much, esp. since that's too much to learn if you don't edit SGML a *lot*. Which of course gives rise to discussions like the current discussion of inline markup... -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
participants (6)
-
David Ascher -
Fred L. Drake, Jr. -
Guido van Rossum -
Jack Jansen -
Paul Prescod -
Sean Mc Grath