[Doc-SIG] XML Conversion Update

Sean Mc Grath digitome@iol.ie
Mon, 30 Aug 1999 18:00:10 +0100

>Sean Mc Grath wrote:
>> I believe we should strive for a semantic naming scheme for
>> information objects. I propose a naming scheme based
>> on what I dub "fully qualified information object identifiers".
>> The idea is to use the hierarchical location of an information
>> object in a document assembly to arrive at a meaninful and unique
>> names e.g.:
>>         Library_Reference-Python_Services-UserList.xml
>>         API-Abstract_Objects_Layer-Mapping_Protocol.xml
[Paul Prescod]
>Great but what about when UserList.xml moves -- all links break. Global
>names are more robust.

Sorry, a case of a very important detail that I did not flesh
out owing to my time crunch!

I mentioned in the first post that
this micro-document architecture supports link management.
My proposal is that when UserList.xml moves, a redirect stub
is left behind. I.e. the file  (using Guido's suggested CamelCasing)

is not deleted, but its contents are just something like:
	<redirect fqio="blah.xml"/>

Where, blah.xml is the new location for the UserList material.
(Periodically, all redirects can be then be expunged).

>> I suggest we go with XML rather than SGML in the sense
>> that anything checked in/out of the system is XML.
>> People who know SGML will probably want to pepper
>> in some tag minimization for their emacs setup:-)
>> They can then use James Clarks SX for example
>> to convert to XML.

[Paul Prescod]
>This presumes that the character representation of the text is
>irrelevant. This is emphatically NOT the case for the same reasons that
>it is not the case with Python. The first problem is that I will be very
>pissed off if I write in a particular style and then check my document
>in and get it back in a very different style. The second problem is that
>"diff" will report that every line has changed. That in turn messes up

I understand your points here but I still think we should go with
plain vanilla XML as the storage notation. Even if we went with
SGML, most SGML tools put inferred tags into your documents for
you whether you like it or not!

>I prefer to operate on a hands-off basis. What you edit is what you
>check in is what is stored is what gets checked out is what you edit.

The only SGML editor I know that allows you to work on a hands-off basis
is emacs! Fully blown SGML editors like Adept, Author/Editor,
Frame etc. all canonicalize the SGML as part of the read/edit/save
round trip.

>The first time some SGML user messes this up I expect everyone will be
>rightly pissed off. This means that we need to make the simplified SGML
>vs. XML choice for real. We can't presume that everyone will do what
>they like. I could live with XML but I think that the cost of allowing
>shorttend <emph>end tags</> is pretty minor and can make a huge
>difference in type-ability.
>Con: this will break compatibility with some XML editors -- do we expect
>Python hackers to use sissified GUI editors?? :)

Frankly, yes. There are some cool XML editing tools beginning
to appear. As part of the Pyxie project I have developed a
servicable XML editor with wxPython. With a bit of work, it could
be tailored to the documentation project to produce easy to
use, fully Python based tools for editing/maintaining the
Python docs.

IBM have made available a Java app. which, given a DTD will
spit out a validating, Java based XML editing app tailored to that

Henry Thomsons XED is Python/Tk based and is getting very
usable in my opinion.

Corel's Wordperfect has a ridiculously good XML editing
capability for such a cheap office suite product!

Even if we went with SGML and people used
Adept, Author/Editor, FrameMaker+SGML, whatever, the
situation would be the same - tag minimization would
be removed by the check-out/edit/check-in round trip.


<Sean URI="http://www.digitome.com/sean.html">
Developers Day Co-Chair, 9th International World Wide Web Conference
16-19, May, 2000, Amsterdam, The Netherlands http://www9.org