[Edu-sig] Shuttleworth Summit

Fri Apr 21 15:04:27 CEST 2006

Ian Bicking wrote:
> Paul D. Fernhout wrote:
>> [I 
>> personally think the Squeak approach would be more stable and 
>> maintainable though, just 2000 lines of core C to port per platform, 
>> with widgets built on that, and a dynamic loading facility for other 
>> native code.] 
> 
> I'm not clear what the advantage of this kind of strategy is over 
> CPython.  Sure, 2000 lines of C is easier to port, but CPython is 
> ported, so that's not a problem.  The graphical layer isn't portable, 
> but pygame is fairly portable and runs on a more optimized layer (SDL) 
> than what Squeak runs on (AFAIK -- though I haven't payed any attention 
> to what their graphical infrastructure is like for years).
 >
> I guess I just don't understand the complaints about Python graphics. 
> Sure, there's work to do, but the core graphics capabilities provide a 
> solid foundation, in addition to some good higher level things as well 
> (like VPython).  If Squeak has some good higher-level ideas, then those 
> would be ported, I don't see any way you could leverage the Squeak code 
> directly.

I think the main issue is not graphics; it is more cross-platform 
development environment. But there are really several issues when you look 
at Squeak:
* crossplatform graphics and other systems (e.g. sound, files, sockets).
* crossplatform development tools using those graphics, and these are good 
tools, including complete source version history, cross referencing 
function use, object inspection, and so on.
* crossplatform object store (saving the system state)
* most of the system (including widgets and most of the VM) is written and 
maintained in the native language of Smalltalk (though in some cases 
translated to C), and so is cross platform
* because the system is so self-contained, it is easier to make it run on 
bare hardware or as a browser plugin.

I've been mostly doing Python the past few years, so my Squeak knowledge 
may be a little out of date (the last two years have seen some major 
changes), but basically, there are four parts to the overall Squeak 
architecture: about 0.1% base platform specific code (C), about 1% VM code 
(C generated from Smalltalk), about 2% loadable code (any language, maybe 
some generated from Smalltalk), and about 96.9% the rest of the object 
system (pure Smalltalk). (The percentages are just my guess of approximate 
code size to give you a feel for it). Some more details on the parts follow.

One part is about 2000 lines of code which mostly support displaying a 
bitmapped window on the screen, handling mouse and keyboard events, 
talking to files, the network, and a basic sound system. [Some of these 
parts might be commented out in various situations, like running headless, 
e.g. "Embedded Squeak".] This layer could potentially be used by Python as 
is, although it probably would need to be tweaked a little to have less 
dependencies to the rest of the Squeak system (i.e. it may expect a 
certain object record format). But it is true that one could use SDL (or 
even wxWindows, or OpenGL) to supply many of these services. In theory, 
one might be able to make use of libraries like the Apache Portable Runtime
   http://apr.apache.org/
to do some of this too. The Python way would probably be to use these 
prebuilt systems, and ignore any extra footprint costs or limiting 
portability to exotic or bare hardware. Squeak as an idea has long 
included the goal to run on bare hardware, which has been demoed, but I'm 
assuming most people here would be content with running just on GNU/Linux 
and then (OS X) Mac and Windows, which any combination of SDL, OpenGL, 
wxWidgets, and APR cover.

Another is a larger amount of C code which is generated by translating a 
subset of Smalltalk to C -- this is what define the bulk of the bytecode 
processing VM plus related support routines.  Since this is written in a 
subset of Smalltalk, it is possible to run this VM code within a Smalltalk 
system (as  simulation in a sense) and develop and debug it with a comfy 
environment. PyPy's RPython is somewhat similar in purpose to Squeaks's 
"Slang".

Then there are dynamically loadable modules which can be written in any 
languages. Note that some of these module may be handcoded C or C++, but 
others might be written in Smalltalk and translated to C using Slang for 
efficiency. There is a Squeak 3D engine that did this approach, starting 
out merged into the VM and then becomign a module. The more complex sound 
primitives written in Smalltalk also migrated out of the VM and into 
modules too. One needs a common cross-platform interface for this. Again, 
perhaps APR might help? One could try using ideas from the related Squeak 
codebase.

Then there is the rest of the Squeak system (including compiler and 
development tools) is written in Smalltalk and works on top of the 
previous three layers, running on the VM. In practice all GUI widgets are 
defined at this top level in Smalltalk (unless you did something funky 
with loadable module for calling wxWindows or native widgets such). When a 
Smalltalk "image" is written out (or read in), what is written (or read) 
is the structure of objects at this fourth layer, and that layer is 
written and loadable in a completely cross-platform way, meaning that for 
all the core development tools and so on, you can move your image from one 
machine to another and just run it. (Of course, if you depend on a 
platform specific dynamically loadable modules, like for Surround Sound, 
that part might not work).

I think a Python using a similar architecture would be pretty neat. But, a 
Python system running on top of SWT or wxWindows (with some of its own 
widgets) and with access to the Apache Portable Runtime library might get 
many of the benefits at little cost. It wouldn't get all the benefits -- 
Squeak can supply a full GUI with development environment and compiler in 
a little over one megabyte and run on bare hardware with just a little 
more glue, but it would be a nice start.

For me, a big issue is transparency, not "graphics" by itself. I like as 
much of the system to be accessible from within itself as possible. I get 
frustrated, say, when I can't drill down into the code of wxWidgets easily 
(and see it as Python). Squeak has that kind of transparency most of the 
time. (Not always, because there is a VM, but mostly, and even the VM can 
be self-hosted and simulated). Still, there are issues with that 
transparency for beginners (who get confused by seeing too much code at 
once, or who break parts of the system they should not mess with at first, 
like make *all* windows hang when opened, which kills the debugger). But 
even experienced users can suffer too, when they hang the system from 
usign too many objects or changing core base classes. So I like the 
promising idea of developing and debugging across images (or VMs) -- that 
is, you develop using tools in a VM you are not also changing, but they 
work across a socket to talk to another VM where your application is 
running. I have one prototype that does that somewhat (for a custom 
language); I built a socket server into the VM. I realized later it would 
probably be better to build a client in instead, and have just one 
redirecting server on the machine, which coordinated client debuggers and 
client applications; that way you just use one common port, and can debug 
multi-VM stuff as well.

I also have a feeling about complexity and Squeak, which is related to the 
struggles the Squeak community has had managing rapid changes to core 
Squeak features. And that is, building on the Squeak vision of the image, 
  it is that one should have one image per application, (which is somewhat 
more how Python acts in practice) rather than try to bundle several of 
them together into one image. That way, applications that work would stay 
working, rather than break every time somebody modifies the base system. 
Disk space and memory and bandwidth are so cheap now, but human time is so 
expensive, why not just have lots of little applications and VMs running 
at the same time. Sure, maybe later one can do like Java 1.5 and have some 
VM sharing, but at the start, I'd rather see lots of robust independent 
applets with wildly different versions of every library, but see them all 
working and relatively bulletproof while other innovations were going on 
in the community. But that would require easier cross-image 
communications, perhaps made easier by the system I outlined above for 
debugging and remote development. This would require of course some rarely 
changing common communications protocol (or at least, one with versions).

Anyway, as Alan Kay says "burn the diskpacks", which in this case I would 
suggest means, don't just try to ape Squeak in Python, but, building on 
Python's (and Squeaks's) strengths, and paying attention to the lessons 
learned from viewing them as experiments, build something better.

> As for actually integrating with Smalltalk, I suspect embedding the 
> Squeak VM in Python is feasible.

There are a couple ways to do this:
* Have two VMs side by side (but communication is a pain)
* Have one unified VM (but one language may suffer or need to change 
somewhat for ease of doing this).

The VMs can be written directly in C (or whatever: C++, Objective C, or 
OCAML :-). Or they could be written in a subset or derivative of Python 
(PyPy's RPython) or Smalltalk (see, it's implementation language, Slang 
http://minnow.cc.gatech.edu/squeak/2267 ). Or one language could be 
written on top of another (though performance might be a big issue if the 
object models mismatch).

In this case, given Python is more flexible using dictionaries for object 
(though slower) I would suggests it would make more sense to use a unified 
VM approach, and put a Smalltalk-like syntax on top of an existing Python 
VM and Python object model (maybe with a couple tweaks) and just see how 
far that goes. This would also have the benefit of making it easy to write 
  a "Self" like prototypish Smalltalk on top of Python, using Python's 
dictionary mechanisms. Squeak's newer GUI (Morphic) is prototype oriented, 
and is derived from work on Self.  I already have a variant of a Smalltalk 
parser written in Python, and they are not that hard to do (Smalltalk is 
an easy language to parse).

I've said a lot of nice things about Squeak, but I'll add here why I use 
Python instead. As a Squeak negative, a big issue for me (others will 
disagree) is the license and licensing history. Anything that Disney 
touches scares me for example, and I don't think that stuff developed when 
the Squeak team was at Disney is clearly licensed (I kept raising the 
Python licensing problems example (CNRI claiming it was never formally 
licensed), and that was just dealing with a non-profit!).
   http://www.python.org/download/releases/1.6/license_faq/
The Squeak license even as it is isn't formally "open source" or "free" 
for several reasons. I could have fixed Squeak's technical issues (and it 
has several I have not mentioned), but I could never get past the license, 
so after it seemed no one "in charge" cared much about fixing it they way 
I wanted it fixed, or alternatively community interest in starting a from 
scratch reimplementation, I moved on. On Python's plus side, it has better 
and more libraries than Squeak, has a bigger community, has a C-like 
syntax the masses find more acceptable (I still prefer Smalltalk's keyword 
syntax though, along with blocks in control structures, though I like 
indentation), it has a relatively good licensing history, and it has 
widespread commercial use (good for consulting). Python misses many of 
Smalltalk's development tools overall, but those are more easily remedied 
by a programmer than changing a license set in stone by two big 
corporations, creating a trap which could spring shut at any moment. As I 
say, people disagree with my perception of the license. Squeak's still a 
neat system, and for most people seems free enough. But Python has had 
more traction and really is free.

--Paul Fernhout