[Python-Dev] Thesis ideas list

Brett C. bac at OCF.Berkeley.EDU
Sat Nov 22 17:56:07 EST 2003

As requested, here is a annotated list of the ideas that I received for 
my masters thesis.  I tried to do a decent job of referencing where info 
is and summarizing the emails I received.  I also did **very** rough 
commenting on each of them in trying to understand them and to see if I 
thought they would make a good thesis for me.

If you have something to contribute to the list, please do so.  Don't 
bother with spelling and grammatical fixes, though, since this is just 
for my personal use and for anyone interested in the ideas; it will not 
see the light of day on python.org or anything unless someone else 
decides to put the effort into that.

I am planning to go in and talk with my thesis advisor after 
Thanksgiving break (next week) to try to narrow down the list so I can 
have a thesis topic chosen by Jan 1.



from Martin: 
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and

Similar to attributes as done in .NET .  Michael's ``func()[]`` syntax 
might pull off what Martin wants.

For an overview of attributes in C#, see 
.  They appear to be a way to connect data with code.  Using reflection 
you can find out what attributes are attached to an object.  You can 
control what types of objects an attribute can be bound to along with 
specifying arguments.  It seems like the compiler has some built-in 
attributes that it uses to process the code when available.

Since it not only just attaches info to an object, but it is used by the 
language to modify the code.  It seems like the func()[] syntax along 
with Python's dynamic attribute creation covers this, just without the 
built-in syntax.

Could be interesting to come up with a variance on descriptors for 
assigning to something like __metadict__.  If it is a data descriptor it 
is just attached as info to the ojbect.  If it a non-data descriptor, 
though, it gets the code object passed to it.  The only perk of this is 
a way to have info attached in a more abstracted way than just sticking 
the info into __dict__ and making sure you don't overwrite the value (in 
other words __metadict__ would not come into play for name resolution). 
  Basically just standard place to store metadata.

Martin's suggestion was having an attribute that would automatically 
create an XML-RPC interface for a code object.  That might be doable as 
a metaclass, but that could get complicated and messy.  If you could do 
something like::

   def meth() [xml-rpcbuilder]: pass

and have 'meth' automatically get an ``annotation(meth, 'xml-rpc')`` 
ability that returns a wrapper implementing an XML-RPC interface that 
might be cool.  You could do this now with a function that takes 
something, creates a wrapper, and then stores it on the object so that 
it does not have to be recreated every time.  But that becomes an issue 
of overwriting values in the object.  Having a place for metadata would 
lesson that problem somewhat.

All in all it *might* be a good thing, but with Python dynamicism I 
don't see a use-case for it at this moment.

Work on PyPy
from `Holger 

Vague statement that one could work on PyPy.  OSCON 2003 paper at 
http://codespeak.net/pypy/index.cgi?doc/oscon2003-paper.html .  PyPy's 
EU funding proposal has interesting parts at 
http://codespeak.net/pypy/index.cgi?doc/funding/B1.0 and 
http://codespeak.net/pypy/index.cgi?doc/funding/B6.0 .

Multiple dispatch
from `Neil 

Look at Dylan_ and Goo_ for inspiration.  An explanation of how Dylan 
does multiple dispatch can be seen at 
http://www.gwydiondylan.org/gdref/tutorial/multiple-dispatch.html .

Multiple dispatch is basically a mechanism of registering a group of 
methods under one generic method that then calls the registered methods 
based on whether the parameter lists can match the arguments being 
passed.  If there is more than one match then they are ordered in terms 
of how "specific" they are; if a parameter requires a subclass or the 
actual class it is less specific.  Methods can then call a special 
method that will call the next method in the calculated order.

The issue with this in terms of Python is how to handle comparing the 
arguments given to a method when a parameter list is just so damn vague. 
  If you have the parameter lists, ``def A(arg1, arg2, *args)`` and 
``def B(*args)``, which one is more specific?  The other issue is that 
since Python has not parameter type-checking beyond argument counts you 
can't base whether a method is more specific or not on the type or 
arguments.  In order for this to be built into the language one would 
have to add type-checking first.  Otherwise one would need to have all 
of this be external to the language.

It should be doable in terms of Python code now.  Building it into the 
language might be nice, but without type checking I don't know how 
useful it would be.

.. _Dylan: http://www.gwydiondylan.org/drm/drm_1.htm
.. _Goo: http://www.ai.mit.edu/~jrb/goo/manual/goomanual.html

Static analysis of Python C code
from Neal (private email)

Look at the work done by `Dawson Engler`_.  Could check for missing 
DECREF/INCREFs, null pointer dereferences, threading issues, etc.

Appears the research was developing a system to check that basic rules 
were met for code (returned values were checked, disabled interrupts get 
re-enabled, etc.).

.. _Dawson Engler: http://www.stanford.edu/~engler/


Mark-and-sweep GC
from `Neil 

Only really worth it in terms of code complexity (does C code become 
easier?  How hard to move existing extension modules over?) and to 
measure performance difference.

Chicken GC
from `Neil 
with more ideas from `Samuele 
and `Phillip Eby 

Chicken_ has its GC covered in a paper entitled "`Cheney on the 
M.T.A.`_".  Seems to be the one Neil likes the most.

Interestingly, Chicken (which is a Scheme-to-C compiler) does all memory 
allocation on the stack.

.. _Chicken: http://www.call-with-current-continuation.org/chicken.html
.. _Cheney on the M.T.A.: http://citeseer.nj.nec.com/baker94cons.html

Boehm-Demers-Weiser collector
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

The collector can be found at 
http://www.hpl.hp.com/personal/Hans_Boehm/gc/index.html .  It is a 
generic mark-and-sweep collector that has been designed to be portable 
and easy to use.

Analyze memory usage
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Apparently `some guys`_ claim that a high-performance, general memory 
allocator works better than a bunch of custom allocators (Python has a 
bunch of the latter).

.. _some guys: http://citeseer.nj.nec.com/berger01reconsidering.html


Provide free threading efficiently
from `Martin 

`In the free threading model, a client app may call any object method 
... from any thread at any time. The object must serialize access to all 
of its methods to whatever extent it requires to keep incoming calls 
from conflicting, providing the maximum performance and flexibility. 
<http://www.microsoft.com/msj/0897/free.aspx>`__.  In other words you 
shouldn't have to do any locking to do a method call.

MP threading
from `Dennis Allison 

Try to eliminate the serialization of Python code execution because of 
the GIL.  Look at research by `Maurice Herlihy`_ and `Kourosh 

.. _Maurice Herlihy: http://www.cs.brown.edu/people/mph/home.html
.. _Kourosh Gharachorloo: 


Python to C
from `Fernando Perez 

`Pat Miller`_ presented a paper on this for scientific work at SciPy 
2003.  Can look to Squeak_ for inspiration.

.. _Pat Miller: http://www.llnl.gov/CASC/people/pmiller/
.. _Squeak: http://www.squeak.org/

Finish AST branch
from `Neil 

No research left, but could lead to macros_.

from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Once access to an AST is available, macros are doable.  Lisp's macros 
work so well because of quasiquotation_.  In order for this to work in 
Python, though, you need some other way to handle it; either through the 
AST like in Maya_ or the CST as in JSE_.  Something else to look at is 
Polyglot_ (what Jeremy wishes the compiler package had become).

.. _quasiquotation: http://citeseer.nj.nec.com/bawden99quasiquotation.html
.. _Maya: http://citeseer.nj.nec.com/baker02maya.html
.. _JSE: http://citeseer.nj.nec.com/context/1821961/0
.. _Polyglot: http://www.cs.cornell.edu/Projects/polyglot/

Refactoring code editor for AST
from `Neil 

Integrating XML and SQL into the language
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Seems to be to make XML and SQL first-class citizens in Python.  Based 
on the work of `Erik Meijer`_.  Paper at 
with his main research page at http://research.microsoft.com/~emeijer/ .

.. _Erik Meijer: http://blogs.gotdotnet.com/emeijer/

Optional type checking
from me, but with support from Guido (private email)

Guido thinks it is "one tough problem".  He suggested looking at the 
`types-sig archive`_ for ideas.  Guido would love to have someone 
sanctioned to tackle this problem.

Might be much easier to do if limited to only parameter lists.  Doing 
that minimal amount would allow for a better multiple dispatch 
implementation.  It would also allow for a rudimentary form of 
polymorphism based on parameter signatures.

.. _types-sig archive: http://www.python.org/pipermail/types-sig/

Type inferencing
from `Martin 

Either run-time or compile-time.  "Overlap with the specializing compilers".

Register-based VM
from Neal (private email)

Should get a nice performance improvement.  Look at Skip and Neil's 
rattler VM.  Would be a step towards hooking Python into GCC for 
assembly code generation.

Lower-level bytecode
from Neal (private email)

Supposedly Java's bytecode is fairly low-level.  Would make the 
transition to a register-based VM easier.  Also would make compiling to 
machine code or JIT compilation simpler.  An IBM developerWorks article 
on Java bytecode is available at 
http://www-106.ibm.com/developerworks/ibm/library/it-haggar_bytecode/ .

Could look at assembly languages (RISC and CISC) and other VMs for ideas 
on bytecodes.


Portable floating point
from Martin: 
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and

Come up with code on a per-platform basis to make up for problems on 
that platform's FPU implementation.  Compare to how Python just provides 
the CPU's implementation while Java guarantees a specific semantic 
behavior by providing the needed code to make it the same on all platforms.

Martin suggested looking at Java's strictfp mode (which was added after 
Java 1.0).  See 
on its usage.

Save interpreter state to disk
from `Martin 

Similar to Smalltalk's images.

Would be nice since would provide a fail-safe mechanism for long-running 
processes.  Could also help with debugging by being able to pass around 
state of a program just before an error occurs.

Deterministic Finalization
from Martin: 
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and

Having objects implicitly destroyed at certain points.  Example is 
threaded code (in Python)::

  def bump_counter(self):
      self.counter = self.counter+1

In C++, you do::

  void bump_counter(){
    MutexAcquistion acquire(this);

which is nice since you don't have to explicitly release the lock.

Optimize global namespace access
from `Neil 
and `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__

Look at `PEP 267`_ and Jeremy's `Faster Namespace`_ slides from 10th 
Python conference.  Neil pointed out that "If we can disallow 
inter-module shadowing of names the job becomes easier" (e.g., making 
``import Foo; Foo.len = 42`` illegal).

.. _PEP 267: http://www.python.org/peps/pep-0267.html
.. _Faster Namespace: 

Restricted execution
from Andrew Bennett (private email)

See the python-dev archives and Summaries for more painful details.

Tail Recursion
from Me (my brain)

Have proper tail recursion in Python.  Would require identifying where a 
direct function call is returned (could keep it simple and just do it 
where CALL_FUNCTION and RETURN bytecodes are in a row).  Also have to 
deal with exception catching since that requires the frame to stay alive 
to handle the exception.

But getting it to work well could help with memory and performance. 
Don't know if it has been done for a language that had exception handling.

More information about the Python-Dev mailing list