[Edu-sig] PySqueak: "Self" as the source of many ideas in Squeak

Sat May 6 09:25:49 CEST 2006

Ian Bicking wrote:
> This is all not to say that modifying Python source in a manner 
> different than ASCII is infeasible.  Only that "source" and "runtime" 
> have to be kept separate to keep this process sane.  Images -- which 
> mingle the two -- have never felt sane to me; clever, but not 
> manageable.  

Well, perhaps sanity is overrated. :-)

And Smalltalk has been managing this for thirty years. :-)

But it is true, it requires a bunch of special purpose tools. Changeset 
browsers. A change log. External source code control. Image cloners. and 
so on. Probably a Python that stored a running image as Python program 
files might need some specialized tools too.

And I will go further and agree that I find mixing running application 
state (like what windows are opened or what the stack looks like) with 
application data (your email, your accounts receivable) is pretty scary too.

But again, one needs some sort of tools to help you with that, and there 
is no reason some things can't be in the image and other things not in it.

In a Smalltalk system, generally there is at least a file with library 
source code and a file with user additions to that as a current change 
log. Although in the case of Squeak, it can decompile methods (if it saves 
comments) even if you don't have the source code. One of the people on the 
project (Alan Kay or Dan Ingalls?) quipped something to the effect that a 
Squeak image was really a source code compression algorithm with a 
development environment thrown in for free.

But, a Smalltalk image is generally saved as binary objects. But now that 
disk is cheap and cpu cycles plentiful, storing an image as python program 
files seems OK to me resource-wise.

 > Unfortunately we do not have very good abstractions in
> Python related to source, so they have to be invented from scratch.  The 
> AST might help, but higher-level abstractions are also called for.  For 
> instance, you might define a color interactively in some fashion, and 
> the color gets serialized to:
> 
>    import color
>    color.Color(r, b, c)
> 
> That color object might be immutable (probably should be), but the 
> *source* isn't immutable, and the source means something.  How can we 
> tell what it means?

I'm not sure anyone needs to ever tell what that means, except the Python 
parser when it reads in the code and runs it (in the context of other code 
it also loaded). Obviously when code is being written out which can 
reconstruct a current object tree, then the writing process needs to 
understand what it means by each piece of code, especially as it might 
refer to previously defined bits to shorten or make more readable the 
output code file. But since it is writing the code, there should not be 
any ambiguities of intent. The hardest issue perhaps becomes making the 
results look as close to human written Python as possible (say, using 
"class" to define a class instead of making a plain object and turning it 
into a class by setting fields).

> I can imagine a bit of interaction between the source and the runtime to 
> do this.  For instance, we might see that "color" is bound to a specific 
> module, and "color.Color" to a specific object.  We'll disallow certain 
> tricks, like binding "color" dynamically, monkeypatching something into 
> "color.Color", etc.  Ultimately figuring out exactly what color.Color is 
> isn't *easy*, but at least it is feasible.
> 
> Using existing introspection we can figure out some parts, some of the 
> sorts of things that IDEs with auto-completion figure out.  They can 
> figure out what the arguments to Color() are and the docstring.

Perhaps you are thinking about this from the point of view of something 
like PyDev parser, which reads Python code to syntax highlight it and 
perhaps  provide code assistance. But, since the code has already been 
read, and we are manipulating the object tree directly, we know what 
everything means, because we just look at what the objects are directly by 
  inspecting them. Granted, some of those objects represent code, and the 
code may be wrong or ambiguous or confusing from the users point of view, 
but that is ignorable by the code writer, which just writes out the 
snippet of code the way a user put it in.

> But, you can also imagine adding an editor or other things to that 
> object; a richer form of __repr__, or a richer editable form than the 
> ASCII source.  Maybe there would be a flag on Color that says it can be 
> called without side effect (meaning we can speculatively call it when 
> inspecting source); and then the resulting object might have something 
> that says it can be displayed in a certain way (with its real color), 
> and has certain slots you can edit and then turn into a new Color 
> invocation.

I can see the value of all this from the GUI side, with various desired 
display options defined in the class or prototype. But again, Python 
object are (or should be :-) constructible from scratch without calling a 
class' init function. So, given that a writer can inspect an instance and 
just write out all the fields in its dictionary and their values it seems 
like we can write out any object (though perhaps it may need to 
recursively write out parts of embedded objects first, and then perhaps 
patch up circular references at the end).

> This is where the 'make' syntax starts to come into play again, or more 
> generally where declarative source comes into play.  Take a typical 
> non-declarative module, like optparse:

I like what you outline and I see how something like it could make the 
code the output writer creates be more legible.

> This is all harder than what HTConsole is doing currently, mostly 
> because Python source introspection is much poorer than Python object 
> introspection.

Good point. And perhaps an area of Python that this project could work on?
Can you elaborate on what parts of source introspection might need the 
most work and in what ways so that either programs or people can better 
inspect running Python programs?

> Loading a Python program can already do anything, if you put the 
> commands at the top-level (not in a function).  So Python source really 
> is focused on constructing objects -- even functions and classes are 
> *built* by the source, not declared by it, and so circular imports and 
> circular class references can be challenging as a result.

True. But, the writer should be able to figure out how to handle this, 
perhaps using some sort of recursive process with a patch list and a 
cleanup routine?

--Paul Fernhout