XML DTD for Python source?

Neel Krishnaswami neelk at brick.cswv.com
Fri Mar 3 20:30:13 EST 2000


Greg Wilson <gvwilson at nevex.com> wrote:
> > "Neel Krishnaswami" <neelk at brick.cswv.com> wrote
> > XML might be a useful interchange format, but IMO it doesn't
> > really help solve the core difficulties in this enterprise.
>
> I wasn't expecting XML to solve my problem; however, given the
> growing number of high-powered XML manipulation tools out there, I
> figured that using it might make my problem easier to solve.  At the
> (trivial for power users, non-trivial for beginners) level, it'd
> also allowed users to format their code in the same rich ways that
> they format web pages.
 
Wouldn't it be simpler to point them at Robin Friedrich's HTMLgen? 
It's hard to get much easier-to-use than "markup(filename)" to HTMLize
a Python source file.

> > [literate programming, Ensemble, etc.]
> 
> Thanks for the Ensemble link --- I know my way around literate 
> programming fairly well, but hadn't run into the Berkeley project.

Ensemble is really cool. If you read their papers and blindly
reimplemented Ensemble, you'd have an IDE that would /blow away/
almost every existing effort, with the possible exception of some
Smalltalk and Common Lisp IDEs. 

I have no idea why it's not well-known; I only stumbled across it
while chasing down a pointer on incremental parsing algorithms. 

> > Neel:
> > ...you *can't* store arbitrary Scheme code in semantically
> > rich XML, because predicting whether you can finish reading a Scheme
> > program is equivalent to the Halting Problem.
> 
> Yes, I'm aware that you can't capture the whole of a program's semantics
> by textual analysis.  However, having written Scheme compilers in Scheme,
> and having messed around (fifteen years ago) with Scheme-to-C compilers,
> I seem to remember building AST's for the language that contained more than
> just the syntax...

I think I explained my point badly. Scheme (and Common Lisp) expand
macros into a core language, which can indeed be represented with an
AST. However, this AST does not map to the "conceptual level" of the
programmer.

For example, Common Lisp has the macro WITH-OPEN-FILE which means
"open this file, do some operations on it, and remember to close it
even if exceptions or other weirdness happens while doing those
operations". This is the conceptual chunk the human programmer works
at.

But the macro call:

(with-open-file (s p :direction :output :if-exists :supersede)
  (format s "Here are a couple~%of test data lines~%"))

expands into this:

(LET ((S (OPEN P :DIRECTION :OUTPUT :IF-EXISTS :SUPERSEDE)))
  (UNWIND-PROTECT
    (MULTIPLE-VALUE-PROG1
      (PROGN (FORMAT S "Here are a couple~%of test data lines~%"))
      (WHEN S (CLOSE S)))
    (WHEN S (CLOSE S :ABORT T))))

which is substantially less perspicacious to a human reader. If the
follow-on tools only see the second version, they won't be able to
properly help with the essential function of code comprehension.

Since macros are Turing-complete, the only general way you can make
this comprehensible to separate follow-on tools is to teach each one
what the macro means. So if there are N presentation tools you will
need to define N additional extensions to the presentation schemas,
one for each follow-on tool. If doing this as painlessly as possible
interests you, you should look up the papers on the "Zodiac" and
"McMicMac" tools at the PLT Scheme website at Rice.

> Greg (who is still looking for a DTD for a programming language)

XSL should count, and should also convince you it's a bad idea. ;)

Seriously, you can easily get yourself a Python DTD by defining a DTD
as follows: convert each nonterminal in the Python grammar into an
element declaration, and then you're done. An example: you should
change the production

  flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt

to an element declaration like this:

  <!ELEMENT flow_stmt ( break_stmt | continue_stmt  
                      | raise_stmt | return_stmt    )>

Start off the DTD with the file_input production, maybe renamed to
something like "python_sourcefile", and proceed mechanically until
you've converted the entire grammar. There's really nothing to
it. (You may have to add some auxilliary productions to satisfy the
limitations of XML -- it's been a long time since I last wrote a DTD,
so I don't remember exactly what's legal.)

Be careful about drawing too many inferences from the AST, though --
remember that when you do an import, it's perfectly legal for the
import hook to email Michael Hudson for a custom version of
bytecodehacks with which to randomly munge all the code objects in
every loaded module. :)

Incidentally, the grammar file is in the Python source distribution at
Grammar/Grammar.


Neel



More information about the Python-list mailing list