[XML-SIG] generating a DTD from a sample document??

Gabe Wachob gwachob@aimnet.com
Thu, 11 Feb 1999 11:02:43 -0800 (PST)

On Thu, 11 Feb 1999, Just van Rossum wrote:

> I vaguely remember someone at the Houston conference mentioning that it is
> possible to generate a DTD from a sample document. If this is true, does
> any of the Python XML packages do this? How?
> (Seems like a real quick way to learn a bit more about writing DTDs ;-)

Well, generating a DTD mechanically may not be as useful as you may think.
its probably easier to find a common DTD (linuxdoc, perhaps) and then look
at many examples.

Also, pick up Dave Megginson's book entitled "Structuring XML Documents".

To answer your original question, generating a really useful DTD from
and XML document probably isn't possible, depending on the complexity of
the document you are starting from and your definition of useful.

You can always generate an "enabling" DTD, a DTD to which the sample XML
document conforms. I can generate one right now:
<!ELEMENT document (ANY|#PCDATA)>

Of course, you could do better, even mechanically, depending on the input
document. However, a single XML document may not contain examples of all
the possible content models for certain elements (ie a <name> element in
the sample document may only contain a <surname> subelement, but may not
contain a <middlename> element even though that <middlename> would
otherwise be a valid subelement of <name> -- mechanical generation of a
DTD from that document would not have any way of knowing about the
<middlename> element).

That being said, mechanical generation of DTDs is useful in the very first
steps of reverse engineering a DTD from XML documents. It doesn't get your
far, though.

I know there is one tool out there that does this for you, but in
practice, these sort of things don't help you that much unless you have
really large XML documents.

Gabe Wachob - http://www.findlaw.com - http://www.aimnet.com/~gwachob
As of today, the U.S. Constitution has been in force for 76,935 days
When this message was sent, there were 27,953,837 seconds before Y2K