[Python-Dev] PEP: __source__ proposal

Stelios Xanthakis sxanth at ceid.upatras.gr
Fri Dec 3 10:54:25 CET 2004


Hi all.

Now that 2.4 is out and everything maybe it's
about time to start discussing the "use the
__source__ Luke" feature which IMO will really
boost python into a new domain of exciting
possibilities.

I've prepared a pre-PEP which is not very good
but it is a base.

In short, the feature is good and it enables
editing of python code at runtime instead of
the runfile-exit-edit-run-exit-edit-run cycle.

We have the following possibilities as to whether
__source__ data is marshalled and the feature is
always enabled.

[1] Command line switch and not marshalled
[2] Always on and not marshalled
[3] Always on and marshalled

There is also [4] which doesn't make much sense.

If I was BDFL I'd go for [1] so whoever wants it
can enable it and whoever doesn't can't complain,
and they'll all leave me alone.
Phillip J. Eby expressed some concerns that the
modules that depend on __source__ will eventually
take over and it will become a standard.

Anyway, the PEP is attached.
You can mail me with votes on the feature and if you
want on your preferred option from 1,2,3.
If I get votes I'll post the results later.

If this is accepted I'll try to come up with a good
patch vs 2.4


Thanks,

St.


-------------------ATTACHED PYTHON ENHANCEMENT PROPOSAL---
PEP: XXX
Title: The __source__ attribute
Version: $Revision: 1.10 $
Last-Modified: $Date: 2003/09/22 04:51:49 $
Author: Stelios Xanthakis
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 19-Nov-2004
Python-Version: 2.4.1
Post-History:


Abstract

     This PEP suggests the implementation of __source__ attribute for
     functions and classes.  The attribute is a read-only string which
     is generated by the parser and is a copy of the original source
     code of the function/class (including comments, indentation and
     whitespace).


Motivation

     It is generally a tempting idea to use python as an interface to
     a program.  The developers can implement all the functionality
     and instead of designing a user interface, provide a python
     interpreter to their users.  Take for example one of the existing
     web browsers: they have everything that would be needed to write
     a script which downloads pages automatically or premutates the
     letters of web pages before they are displayed, but it is not
     possible for the user to do these things because the interface
     of these applications is static.

     A much more powerful approach would be an interface which is
     dynamically constructed by the user to meet the user's needs.
     The most common development cycle of python programs is:
     write .py file - execute .py file - exit - enhance .py file -
     execute .py file - etc.  With the implementation of the __source__
     attribute though the development/modification of python code
     can happen at run-time.  Functions and classes can be defined,
     modified or enhanced while the python shell is running and
     all the changes can be saved by saving the __source__ attribute
     of globals before termination.  Moreover, in such a system
     it is possible to modify the "code modification routines" and
     eventually we have a self-modifying interface.  Using a
     program also means improving its usability.

     The current solution of using 'inspect' to get the source
     code of functions is not adequate because it doesn't work
     for code defined with "exec" and it doesn't have the source
     of functions/classes defined in the interactive mode.  Generally,
     a "file" is something too abstract.  What is more real is the
     data received by the python parser and that is what is stored
     in __source__.


Specification

     The __source__ attribute is a read-only attribute of functions
     and classes.  Its type is string or None.  In the case of None
     it means that the source was not available.

     The indentation of the code block is the original identation
     obeying nested definitions.  For example:

         >>> class A:
         ...     def foo (self):
         ...         print """Santa-Clauss
         ... is coming to town"""
         >>> def spam ():
         ...     def closure ():
         ...         pass
         ...     return closure
         >>> print A.foo.__source__
             def foo (self):
                 print """Santa-Clauss
         is coming to town"""
         >>> print spam().__source__
             def closure ():
                 pass

     The attribute is not marshaled and therefore not stored in
     ".pyc" files.  As a consequence, functions and classes of
     imported modules have __source__==None.

     We propose that the generation of __source__ will be
     controlled by a command line option.  In the case this
     feature is not activated by the command line option, the
     attribute is absent.


Rationale

     Generally, "import" refers to modules that either have a file in
     a standard location or they are distributed in ".pyc" form only.
     Therefore in the case of modules, getting the source with
     "inspect" is adequate.  Moreover, it does not make sense saving
     __source__ in ".pyc" because the point would be to save
     modifications in the original ".py" file (if available).

     On the issue of the command-line option controlling the generation
     of __source__, please refer to the section about the overhead
     of this feature.  The rationale is that those applications that
     do not wish to use this feature can avoid it (cgi scripts in
     python benchmarked against another language).


Overhead

     The python's parser is not exactly well-suited for such a feature.
     Execution of python code goes through the stages of lexical
     analysis, tokenization, generation of AST and execution of
     bytecode.  In order to implement __source__, the tokenizer has
     to be modified to store the lines of the current translation
     unit.  Those lines are then attached the root node of the
     AST.  While the AST is compiled we have to keep a reference
     of the current node in order to be able to find the next node
     after the node for which we wish to generate __source__, get
     the first and the last line of our block and then refer to
     the root node to extract these lines and make a string.  All
     these actions add a minor overhead to some heavily optimized
     parts of python.  However, once compilation to bytecode is
     done, this feature no longer affects the performance of the
     execution of the bytecode.

     There is also the issue of the memory spent to store __source__.
     In our opinion, this is worth the tradeoff for those who
     are willing to take advantage of it.


Implementation

     There is a sample implementation at [2] which consists of a
     patch against python 2.3.4.  The patch has to be improved
     to avoid generating __source__ for the case we are importing
     modules for the first time (not from .pyc).  In the sample
     implementation there is also included a sample shell that
     takes advantage of __source__ and demonstrates some aspects
     that motivated us towards patching python and submitting this
     PEP.


References

     [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
         http://www.python.org/peps/pep-0001.html

     [2] Sample implementation
         http://students.ceid.upatras.gr/~sxanth/ISYSTEM/python-PIESS.tar.gz


Copyright

     This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:


More information about the Python-Dev mailing list