XML DTD for Python source?

Thu Mar 2 20:25:12 EST 2000

Greg Wilson <gvwilson at nevex.com> wrote:
> > "Neel Krishnaswami" <neelk at brick.cswv.com> wrote
> > Can I ask what the purpose of this utility would be?

I've rearranged your response a bit, so that I can answer the
interesting part first. :)

> Most of the data I deal with these days is either in XML, or headed
> that way. [...] I'm interested in exploring what would happen if I
> could do with programs what I do with hypertext:
>
> - apply a DTD to switch between Scheme-style parenthesizing, Python-style
>   indentation, or C-style bracing
> 
> - embed arbitrary information (images, optimization hints for the
>   compiler, etc.) in a way that third-party browsers and processors
>   can handle (specially-formatted comments are *not* the answer)
>
> One of the big motivations for my interest is that I don't expect
> kids my niece's age to put up with glass typewriters as a
> programming environment. They're already building web pages with
> images, their choices of color, etc.  Existing IDEs mostly just put
> lipstick on this particular toad...
>
> ...and then define a standard for specially-formatted comments that
> contain URLs and other enriched information, and turn the little
> parser into a browser plug-in so that it can translate .py files on
> the fly, and... That might be the only way forward, but I am
> interested in exploring what happens if we finally do to our
> programs what we're done to everyone else's documents :-).

I have looked into building a rich structuring editor, and let me tell
you, this is a MUCH bigger can of worms than you realize. It's bigger
than I realize, too, even though I think building a high-quality
programming environment could easily generate a dozen PhD theses along
the way. XML might be a useful interchange format, but IMO it doesn't
really help solve the core difficulties in this enterprise.

The two big problems surrounding building a good DE are:

o I have a program, and I want multiple views of and means of changing
  it.

o My program changes over time, with changes made by myself and
  others. I want to be able to understand and manage these changes. 

Simple examples of the first sort of tool are the "tags" program that
Unix programmers use to find function definitions, and Javadoc. A
moderately sophisticated example might be cweb, Knuth's literate
programming tool, or a form-painting GUI builder. Some extremely
sophisticated examples might be the Smalltalk refactoring code
browser, and PLT Scheme's static dataflow analyzer MrSpidey.

A simple example of the second sort of tool would be the "undo"
functionality in editors. A moderately complex version of is CVS. An
extremely sophisticated example of the second is using a database to
get fine-grained version control of the objects in a Smalltalk image.

Some things you should play with/look at are:

o Allegro Common Lisp 
o Visual Age Smalltalk

  These two are worth studying, because they show the level of 
  reflectivity and self-understanding a system needs in order to
  adequately answer the programmer's questions about the system,
  and to learn where the state of the art already is.

o Berkeley's Ensemble project

  This is an attempt to build an advanced multilingual development
  environment, with support for things like flagging the programmer
  if the code s/he's writing has incorrect syntax or doesn't type
  check, showing diagrams of hyperlinked class hierarchies, and so
  on, plus using an ODB to get persistent versioning and ACIDity
  for your code. There are a lot of good papers and theses available
  on the website -- I recommend it highly.

Note: I'm not saying this isn't worth doing -- quite the contrary! I'm
just saying that a) doing the Right Thing is one hell of a lot of
work, and b) your analyzer has to understand both the program and the
programming language to say very much useful about it.

> > what would an XML representation for Scheme programs look like?
>
> I don't know, but I don't think it would look like the token-level
> representation you posted.  I think it would look more like an XML
> encoding of an abstract syntax tree (AST).  

Hah! You fell for my trick question. :)

Seriously, you *can't* store arbitrary Scheme code in semantically
rich XML, because predicting whether you can finish reading a Scheme
program is equivalent to the Halting Problem. Scheme's hygienic macro
system is Turing complete, and you can rebind the symbols bound to the
special forms. It's like a document that can add and delete element
declarations to and from its DTD.

The same is somewhat true of Python, though less forcefully. It's
probably not possible to change they syntactic structure of Python,
but you can't really very infer much semantics from that, short of
actually running the program. For example, consider the __import__
hook. Doing an import can conceivably change nearly anything in the
Python environment.

It's a lot like type-safety: in order to build a language that can be
type-checked, you have to accept that there are good programs that
that your typesystem can't assign a type to.

[I think this analogy can actually be pushed further than I thought,
because if you think of a DTD as defining a datatype, then wanting to
store some semantics of the program in the XML means that you are
trying to guarantee that the program has the "type" defined by the
DTD. You will either have to accept that the XML is sometimes wrong,
or sometimes have to reject vaild programs. Neat!]

Neel