[Python-Dev] Thesis ideas list
Brett C.
bac at OCF.Berkeley.EDU
Sat Nov 22 17:56:07 EST 2003
As requested, here is a annotated list of the ideas that I received for
my masters thesis. I tried to do a decent job of referencing where info
is and summarizing the emails I received. I also did **very** rough
commenting on each of them in trying to understand them and to see if I
thought they would make a good thesis for me.
If you have something to contribute to the list, please do so. Don't
bother with spelling and grammatical fixes, though, since this is just
for my personal use and for anyone interested in the ideas; it will not
see the light of day on python.org or anything unless someone else
decides to put the effort into that.
I am planning to go in and talk with my thesis advisor after
Thanksgiving break (next week) to try to narrow down the list so I can
have a thesis topic chosen by Jan 1.
----------------------------------
=====
Misc.
=====
Annotations
-----------
from Martin:
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and
http://mail.python.org/pipermail/python-dev/2003-October/039809.html
Similar to attributes as done in .NET . Michael's ``func()[]`` syntax
might pull off what Martin wants.
For an overview of attributes in C#, see
http://www.ondotnet.com/pub/a/dotnet/excerpt/prog_csharp_ch18/index.html?page=1
. They appear to be a way to connect data with code. Using reflection
you can find out what attributes are attached to an object. You can
control what types of objects an attribute can be bound to along with
specifying arguments. It seems like the compiler has some built-in
attributes that it uses to process the code when available.
Since it not only just attaches info to an object, but it is used by the
language to modify the code. It seems like the func()[] syntax along
with Python's dynamic attribute creation covers this, just without the
built-in syntax.
Could be interesting to come up with a variance on descriptors for
assigning to something like __metadict__. If it is a data descriptor it
is just attached as info to the ojbect. If it a non-data descriptor,
though, it gets the code object passed to it. The only perk of this is
a way to have info attached in a more abstracted way than just sticking
the info into __dict__ and making sure you don't overwrite the value (in
other words __metadict__ would not come into play for name resolution).
Basically just standard place to store metadata.
Martin's suggestion was having an attribute that would automatically
create an XML-RPC interface for a code object. That might be doable as
a metaclass, but that could get complicated and messy. If you could do
something like::
def meth() [xml-rpcbuilder]: pass
and have 'meth' automatically get an ``annotation(meth, 'xml-rpc')``
ability that returns a wrapper implementing an XML-RPC interface that
might be cool. You could do this now with a function that takes
something, creates a wrapper, and then stores it on the object so that
it does not have to be recreated every time. But that becomes an issue
of overwriting values in the object. Having a place for metadata would
lesson that problem somewhat.
All in all it *might* be a good thing, but with Python dynamicism I
don't see a use-case for it at this moment.
Work on PyPy
------------
from `Holger
<http://mail.python.org/pipermail/python-dev/2003-October/039772.html>`__
Vague statement that one could work on PyPy. OSCON 2003 paper at
http://codespeak.net/pypy/index.cgi?doc/oscon2003-paper.html . PyPy's
EU funding proposal has interesting parts at
http://codespeak.net/pypy/index.cgi?doc/funding/B1.0 and
http://codespeak.net/pypy/index.cgi?doc/funding/B6.0 .
Multiple dispatch
-----------------
from `Neil
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__
Look at Dylan_ and Goo_ for inspiration. An explanation of how Dylan
does multiple dispatch can be seen at
http://www.gwydiondylan.org/gdref/tutorial/multiple-dispatch.html .
Multiple dispatch is basically a mechanism of registering a group of
methods under one generic method that then calls the registered methods
based on whether the parameter lists can match the arguments being
passed. If there is more than one match then they are ordered in terms
of how "specific" they are; if a parameter requires a subclass or the
actual class it is less specific. Methods can then call a special
method that will call the next method in the calculated order.
The issue with this in terms of Python is how to handle comparing the
arguments given to a method when a parameter list is just so damn vague.
If you have the parameter lists, ``def A(arg1, arg2, *args)`` and
``def B(*args)``, which one is more specific? The other issue is that
since Python has not parameter type-checking beyond argument counts you
can't base whether a method is more specific or not on the type or
arguments. In order for this to be built into the language one would
have to add type-checking first. Otherwise one would need to have all
of this be external to the language.
It should be doable in terms of Python code now. Building it into the
language might be nice, but without type checking I don't know how
useful it would be.
.. _Dylan: http://www.gwydiondylan.org/drm/drm_1.htm
.. _Goo: http://www.ai.mit.edu/~jrb/goo/manual/goomanual.html
Static analysis of Python C code
--------------------------------
from Neal (private email)
Look at the work done by `Dawson Engler`_. Could check for missing
DECREF/INCREFs, null pointer dereferences, threading issues, etc.
Appears the research was developing a system to check that basic rules
were met for code (returned values were checked, disabled interrupts get
re-enabled, etc.).
.. _Dawson Engler: http://www.stanford.edu/~engler/
======
Memory
======
Mark-and-sweep GC
-----------------
from `Neil
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__
Only really worth it in terms of code complexity (does C code become
easier? How hard to move existing extension modules over?) and to
measure performance difference.
Chicken GC
----------
from `Neil
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__
with more ideas from `Samuele
<http://mail.python.org/pipermail/python-dev/2003-October/039818.html>`__
and `Phillip Eby
<http://mail.python.org/pipermail/python-dev/2003-October/039832.html>`__
Chicken_ has its GC covered in a paper entitled "`Cheney on the
M.T.A.`_". Seems to be the one Neil likes the most.
Interestingly, Chicken (which is a Scheme-to-C compiler) does all memory
allocation on the stack.
.. _Chicken: http://www.call-with-current-continuation.org/chicken.html
.. _Cheney on the M.T.A.: http://citeseer.nj.nec.com/baker94cons.html
Boehm-Demers-Weiser collector
-----------------------------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__
The collector can be found at
http://www.hpl.hp.com/personal/Hans_Boehm/gc/index.html . It is a
generic mark-and-sweep collector that has been designed to be portable
and easy to use.
Analyze memory usage
--------------------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__
Apparently `some guys`_ claim that a high-performance, general memory
allocator works better than a bunch of custom allocators (Python has a
bunch of the latter).
.. _some guys: http://citeseer.nj.nec.com/berger01reconsidering.html
=========
Threading
=========
Provide free threading efficiently
----------------------------------
from `Martin
<http://mail.python.org/pipermail/python-dev/2003-October/039768.html>`__
`In the free threading model, a client app may call any object method
... from any thread at any time. The object must serialize access to all
of its methods to whatever extent it requires to keep incoming calls
from conflicting, providing the maximum performance and flexibility.
<http://www.microsoft.com/msj/0897/free.aspx>`__. In other words you
shouldn't have to do any locking to do a method call.
MP threading
------------
from `Dennis Allison
<http://mail.python.org/pipermail/python-dev/2003-October/039779.html>`__
Try to eliminate the serialization of Python code execution because of
the GIL. Look at research by `Maurice Herlihy`_ and `Kourosh
Gharachorloo`_.
.. _Maurice Herlihy: http://www.cs.brown.edu/people/mph/home.html
.. _Kourosh Gharachorloo:
http://research.compaq.com/wrl/people/kourosh/bio.html
=========
Compiling
=========
Python to C
-----------
from `Fernando Perez
<http://mail.python.org/pipermail/python-dev/2003-October/039780.html>`__
`Pat Miller`_ presented a paper on this for scientific work at SciPy
2003. Can look to Squeak_ for inspiration.
.. _Pat Miller: http://www.llnl.gov/CASC/people/pmiller/
.. _Squeak: http://www.squeak.org/
Finish AST branch
-----------------
from `Neil
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__
No research left, but could lead to macros_.
Macros
------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__
Once access to an AST is available, macros are doable. Lisp's macros
work so well because of quasiquotation_. In order for this to work in
Python, though, you need some other way to handle it; either through the
AST like in Maya_ or the CST as in JSE_. Something else to look at is
Polyglot_ (what Jeremy wishes the compiler package had become).
.. _quasiquotation: http://citeseer.nj.nec.com/bawden99quasiquotation.html
.. _Maya: http://citeseer.nj.nec.com/baker02maya.html
.. _JSE: http://citeseer.nj.nec.com/context/1821961/0
.. _Polyglot: http://www.cs.cornell.edu/Projects/polyglot/
Refactoring code editor for AST
-------------------------------
from `Neil
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__
Integrating XML and SQL into the language
-----------------------------------------
from `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__
Seems to be to make XML and SQL first-class citizens in Python. Based
on the work of `Erik Meijer`_. Paper at
http://www.research.microsoft.com/~emeijer/Papers/XML2003/xml2003.html
with his main research page at http://research.microsoft.com/~emeijer/ .
.. _Erik Meijer: http://blogs.gotdotnet.com/emeijer/
Optional type checking
----------------------
from me, but with support from Guido (private email)
Guido thinks it is "one tough problem". He suggested looking at the
`types-sig archive`_ for ideas. Guido would love to have someone
sanctioned to tackle this problem.
Might be much easier to do if limited to only parameter lists. Doing
that minimal amount would allow for a better multiple dispatch
implementation. It would also allow for a rudimentary form of
polymorphism based on parameter signatures.
.. _types-sig archive: http://www.python.org/pipermail/types-sig/
Type inferencing
----------------
from `Martin
<http://mail.python.org/pipermail/python-dev/2003-October/039768.html>`__
Either run-time or compile-time. "Overlap with the specializing compilers".
Register-based VM
-----------------
from Neal (private email)
Should get a nice performance improvement. Look at Skip and Neil's
rattler VM. Would be a step towards hooking Python into GCC for
assembly code generation.
Lower-level bytecode
--------------------
from Neal (private email)
Supposedly Java's bytecode is fairly low-level. Would make the
transition to a register-based VM easier. Also would make compiling to
machine code or JIT compilation simpler. An IBM developerWorks article
on Java bytecode is available at
http://www-106.ibm.com/developerworks/ibm/library/it-haggar_bytecode/ .
Could look at assembly languages (RISC and CISC) and other VMs for ideas
on bytecodes.
=========
Execution
=========
Portable floating point
-----------------------
from Martin:
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and
http://mail.python.org/pipermail/python-dev/2003-October/039809.html
Come up with code on a per-platform basis to make up for problems on
that platform's FPU implementation. Compare to how Python just provides
the CPU's implementation while Java guarantees a specific semantic
behavior by providing the needed code to make it the same on all platforms.
Martin suggested looking at Java's strictfp mode (which was added after
Java 1.0). See
http://developer.java.sun.com/developer/JDCTechTips/2001/tt0410.html#using
on its usage.
Save interpreter state to disk
------------------------------
from `Martin
<http://mail.python.org/pipermail/python-dev/2003-October/039768.html>`__
Similar to Smalltalk's images.
Would be nice since would provide a fail-safe mechanism for long-running
processes. Could also help with debugging by being able to pass around
state of a program just before an error occurs.
Deterministic Finalization
--------------------------
from Martin:
http://mail.python.org/pipermail/python-dev/2003-October/039768.html and
http://mail.python.org/pipermail/python-dev/2003-October/039809.html
Having objects implicitly destroyed at certain points. Example is
threaded code (in Python)::
def bump_counter(self):
self.mutex.acquire()
try:
self.counter = self.counter+1
more_actions()
finally:
self.mutex.release()
In C++, you do::
void bump_counter(){
MutexAcquistion acquire(this);
this->counter+=1;
more_actions();
which is nice since you don't have to explicitly release the lock.
Optimize global namespace access
--------------------------------
from `Neil
<http://mail.python.org/pipermail/python-dev/2003-October/039778.html>`__
and `Jeremy <http://www.python.org/~jeremy/weblog/031029c.html>`__
Look at `PEP 267`_ and Jeremy's `Faster Namespace`_ slides from 10th
Python conference. Neil pointed out that "If we can disallow
inter-module shadowing of names the job becomes easier" (e.g., making
``import Foo; Foo.len = 42`` illegal).
.. _PEP 267: http://www.python.org/peps/pep-0267.html
.. _Faster Namespace:
http://www.python.org/~jeremy/talks/spam10/PEP-267-1.html
Restricted execution
--------------------
from Andrew Bennett (private email)
See the python-dev archives and Summaries for more painful details.
Tail Recursion
--------------
from Me (my brain)
Have proper tail recursion in Python. Would require identifying where a
direct function call is returned (could keep it simple and just do it
where CALL_FUNCTION and RETURN bytecodes are in a row). Also have to
deal with exception catching since that requires the frame to stay alive
to handle the exception.
But getting it to work well could help with memory and performance.
Don't know if it has been done for a language that had exception handling.
More information about the Python-Dev
mailing list