XML DTD for Python source?

Greg Wilson gvwilson at nevex.com
Thu Mar 2 17:15:07 CET 2000

> > Greg Wilson <gvwilson at nevex.com> wrote:
> > Has anyone defined an XML DTD for storing Python source code?  If
> > so, I'd be grateful for a copy or pointer.  If not, I'd like to hear
> > from anyone who'd be interested...

> "Neel Krishnaswami" <neelk at brick.cswv.com> wrote
> Can I ask what the purpose of this utility would be?

Most of the data I deal with these days is either in XML, or headed that
The two big exceptions are legacy configuration files (like .ini's, .rc's,
and makefiles),
and program source. I'm interested in exploring what would happen if I could
do with
programs what I do with hypertext:

- apply a DTD to switch between Scheme-style parenthesizing, Python-style
  indentation, or C-style bracing

- embed arbitrary information (images, optimization hints for the compiler,
etc.) in
  a way that third-party browsers and processors can handle
  comments are *not* the answer)

I realize that human beings would not want to type:

    if (i < 10):

instead of:

    if (i < 10):

but that's what editors are for.  Every editor I use these days does more
than just
show me my flat ASCII --- even Emacs does syntax coloring, etc.

> If you just want to store programs in a well-structured form, then
> I'll point out that a Python script that the interpreter doesn't barf
> on is *already* in a well-structured form.

Yes, but not one that's easily accessible to (for example) XMetal, or the
generation of XML-aware 'diff' tools.

> what would an XML representation for Scheme programs look like?

I don't know, but I don't think it would look like the token-level
you posted.  I think it would look more like an XML encoding of an abstract
syntax tree (AST).  Of course, there'd have to be some escape mechanism to
handle illegal syntax --- a good DTD would allow representation of programs
that were in the process of being written, as well as ones that were

> It's obvious that you don't get any additional structuring information
> to work with. This fact is transparent in the case of Scheme, but no
> less true for Python.

What information beyond the actual text of the program do you put in an AST?
What information do optimizing compiler passes add?  What annotations will
third-grade teachers want to add?

One of the big motivations for my interest is that I don't expect kids my
age to put up with glass typewriters as a programming environment. They're
already building web pages with images, their choices of color, etc.
Existing IDEs
mostly just put lipstick on this particular toad...

> Wait -- I just thought of a use of XML in this case. Do you have tools
> that accept only XML, and you want to XML-ize Python code so that it
> can eat it? In that case, it's probably straightforward to take SPARK
> and define parser actions that emit XML tags instead of source
> text. You can define a DTD straight from the Python grammar; each
> production becomes an element declaration in the DTD.

...and then define a standard for specially-formatted comments that contain
and other enriched information, and turn the little parser into a browser
plug-in so
that it can translate .py files on the fly, and... That might be the only
way forward,
but I am interested in exploring what happens if we finally do to our
programs what
we're done to everyone else's documents :-).


More information about the Python-list mailing list