[Python-checkins] r50710 - python/branches/bcannon-sandboxing/sandboxing_design_doc.txt python/branches/bcannon-sandboxing/securing_python.txt

brett.cannon python-checkins at python.org
Wed Jul 19 03:52:11 CEST 2006


Author: brett.cannon
Date: Wed Jul 19 03:52:10 2006
New Revision: 50710

Added:
   python/branches/bcannon-sandboxing/securing_python.txt
      - copied, changed from r50656, python/branches/bcannon-sandboxing/sandboxing_design_doc.txt
Removed:
   python/branches/bcannon-sandboxing/sandboxing_design_doc.txt
Log:
Redesign the security model to use object-capabilities.  Aim was to explain
more design plans than API.  This allows more flexibility in the future and
hopefully makes it easier for people to see if the design is sound and makes
sense.

Also renamed the file securing_python.txt (original name was rather long and
this one just "feels" better).

Have two (known) sections left to write before this new draft is finished.


Deleted: /python/branches/bcannon-sandboxing/sandboxing_design_doc.txt
==============================================================================
--- /python/branches/bcannon-sandboxing/sandboxing_design_doc.txt	Wed Jul 19 03:52:10 2006
+++ (empty file)
@@ -1,1195 +0,0 @@
-Restricted Execution for Python
-#######################################
-
-About This Document
-=============================
-
-This document is meant to lay out the general design for re-introducing
-a sandboxing model for Python.  This document should provide one with
-enough information to understand the goals for sandboxing, what
-considerations were made for the design, and the actual design itself.
-Design decisions should be clear and explain not only why they were
-chosen but possible drawbacks from taking a specific approach.
-
-If any of the above is found not to be true, please email me at
-brett at python.org and let me know what problems you are having with the
-document.
-
-
-XXX TO DO
-=============================
-
-Design
---------------
-
-* threading needs protection?
-* python-dev convince me that hiding 'file' possible?
-    + based on that, handle code objects
-    + also decide how to handle sockets
-    + perhaps go with crippling but try best effort on hiding reference and if
-      best effort holds up eventually shift over to capabilities system
-* resolve to IP at call time to prevent DNS man-in-the-middle attacks when
-  allowing a specific host name?
-* what network info functions are allowed by default?
-* does the object.__subclasses__() trick work across interpreters, or is it
-  unique per interpreter?
-* figure out default whitelist of extension modules
-* check default accessible objects for file path exposure
-* helper functions to get at StringIO instances for stdin, stdout, and friends?
-* decide on what type of objects (e.g., PyStringObject or const char *) are to
-  be passed in
-* all built-ins properly protected?
-* exactly how to tell whether argument to open() is a path, IP, or host name
-  (third argument, 'n' prefix for networking, format of path, ...)
-* API at the Python level
-* for extension module protection, allow for wildcard allowance
-  (e.g., ``xml.*``)
-
-
-Implementation
---------------
-
-* add __sandbox__
-* merge from HEAD
-    + last merge on rev. 47248
-* remove bare malloc()/realloc()/free() uses
-    + also watch out for PyObject_Malloc()/PyObject_MALLOC() calls
-* note in SpecialBuilds.txt
-
-
-Goal
-=============================
-
-A good sandboxing model provides enough protection to prevent malicious
-harm to come to the system, and no more.  Barriers should be minimized
-so as to allow most code that does not do anything that would be
-regarded as harmful to run unmodified.  But the protections need to be
-thorough enough to prevent any unintended changes or information of the
-system to come about.
-
-An important point to take into consideration when reading this
-document is to realize it is part of my (Brett Cannon's) Ph.D.
-dissertation.  This means it is heavily geared toward sandboxing when
-the interpreter is working with Python code embedded in a web page as
-viewed in Firefox.  While great strides have been taken to keep the
-design general enough so as to allow all previous uses of the 'rexec'
-module [#rexec]_ to be able to use the new design, it is not the
-focused goal.  This means if a design decision must be made for the
-embedded use case compared to sandboxing Python code in a pure Python
-application, the former will win out over the latter.
-
-Throughout this document, the term "resource" is used to represent
-anything that deserves possible protection.  This includes things that
-have a physical representation (e.g., memory) to things that are more
-abstract and specific to the interpreter (e.g., sys.path).
-
-When referring to the state of an interpreter, it is either
-"unprotected" or "sandboxed".  A unprotected interpreter has no
-restrictions imposed upon any resource.  A sandboxed interpreter has at
-least one, possibly more, resource with restrictions placed upon it to
-prevent unsafe code  that is running within the interpreter to cause
-harm to the system.
-
-
-.. contents::
-
-
-Use Cases
-/////////////////////////////
-
-All use cases are based on how many sandboxed interpreters are running
-in a single process and whether an unprotected interpreter is also
-running.  The use cases can be broken down into two categories: when
-the interpreter is embedded and only using sandboxed interpreters, and
-when pure Python code is running in an unprotected interpreter and uses
-sandboxed interpreters.
-
-
-When the Interpreter Is Embedded
-================================
-
-Single Sandboxed Interpreter
-----------------------------
-
-This use case is when an application embeds the interpreter and never
-has more than one interpreter running which happens to be sandboxed.
-
-
-Multiple Sandboxed Interpreters
--------------------------------
-
-When multiple interpreters, all sandboxed at varying levels, need to be
-running within a single application.  This is the key use case that
-this proposed design is targeted for.
-
-
-Stand-Alone Python
-=============================
-
-When someone has written a Python program that wants to execute Python
-code in an sandboxed interpreter(s).  This is the use case that 'rexec'
-attempted to fulfill.
-
-
-Issues to Consider
-=============================
-
-Common to all use cases, resources that the interpreter requires to
-function at a level below user code cannot be exposed to a sandboxed
-interpreter.  For instance, the interpreter might need to stat a file
-to see if it is possible to import.  If the ability to stat a file is
-not allowed to a sandboxed interpreter, it should not be allowed to
-perform that action, regardless of whether the interpreter at a level
-below user code needs that ability.
-
-When multiple interpreters are involved (sandboxed or not), not
-allowing an interpreter to gain access to resources available in other
-interpreters without explicit permission must be enforced.
-
-
-Resources to Protect
-/////////////////////////////
-
-It is important to make sure that the proper resources are protected
-from a sandboxed interpreter.  If you don't there is no point to sandboxing.
-
-Filesystem
-===================
-
-All facets of the filesystem must be protected.  This means restricting
-reading and writing to the filesystem (e.g., files, directories, etc.).
-It should be allowed in controlled situations where allowing access to
-the filesystem is desirable, but that should be an explicit allowance.
-
-There must also be protection to prevent revealing any information
-about the filesystem.  Disclosing information on the filesystem could
-allow one to infer what OS the interpreter is running on, for instance.
-
-
-Memory
-===================
-
-Memory should be protected.  It is a limited resource on the system
-that can have an impact on other running programs if it is exhausted.
-Being able to restrict the use of memory would help alleviate issues
-from denial-of-service (DoS) attacks on the system.
-
-
-Networking
-===================
-
-Networking is somewhat like the filesystem in terms of wanting similar
-protections.  You do not want to let unsafe code make socket
-connections unhindered or accept them to do possibly nefarious things.
-You also want to prevent finding out information about the network your
-are connected to.
-
-
-Interpreter
-===================
-
-One must make sure that the interpreter is not harmed in any way from
-sandboxed code.  This usually takes the form of crashing the program
-that the interpreter is embedded in or the unprotected interpreter that
-started the sandbox interpreter.  Executing hostile bytecode that might
-lead to undesirable effects is another possible issue.
-
-There is also the issue of taking it over.  One should not able to gain
-escalated privileges in any way without explicit permission.
-
-
-Types of Security
-///////////////////////////////////////
-
-As with most things, there are multiple approaches one can take to
-tackle a problem.  Security is no exception.  In general there seem to
-be two approaches to protecting resources.
-
-
-Resource Hiding
-=============================
-
-By never giving code a chance to access a resource, you prevent it from
-being (ab)used.  This is the idea behind resource hiding; you can't
-misuse something you don't have in the first place.
-
-The most common implementation of resource hiding is capabilities.  In
-this type of system a resource's reference acts as a ticket that
-represents the right to use the resource.  Once code has a reference it
-is considered to have full use of resource that reference represents
-and no further security checks are directly performed (using delegates
-and other structured ways one can actually have a security check for
-each access of a resource, but this is not a default behaviour).
-
-As an example, consider the 'file' type as a resource we want to
-protect.  That would mean that we did not want a reference to the
-'file' type to ever be accessible without explicit permission.  If one
-wanted to provide read-only access to a temp file, you could have
-open() perform a check on the permissions of the current interpreter,
-and if it is allowed to, return a proxy object for the file that only
-allows reading from it.  The 'file' instance for the proxy would need
-to be properly hidden so that the reference was not reachable from
-outside so that 'file' access could still be controlled.
-
-Python, as it stands now, unfortunately does not work well for a pure
-capabilities system.  Capabilities require the prohibition of certain
-abilities, such as "direct access to another's private state"
-[#paradigm regained]_.  This obviously is not possible in Python since,
-at least at the Python level, there is no such thing as private state
-that is persistent (one could argue that local variables that are not
-cell variables for lexical scopes are private, but since they do not
-survive after a function call they are not usable for keeping
-persistent state).  One can hide references at the C level by storing
-it in the struct for the instance of a type and not providing a
-function to access that attribute.
-
-Python's introspection abilities also do not help make implementing
-capabilities that much easier.  Consider how one could access 'file'
-even when it is deleted from __builtin__.  You can still get to the
-reference for 'file' through the sequence returned by
-``object.__subclasses__()``.
-
-
-Resource Crippling
-=============================
-
-Another approach to security is to not worry about controlling access
-to the reference of a resource.  One can have a resource perform a
-security check every time someone tries to use a method on that
-resource.  This pushes the security check to a lower level; from a
-reference level to the method level.
-
-By performing the security check every time a resource's method is
-called the worry of a specific resource's reference leaking out to
-insecure code is alleviated.  This does add extra overhead, though, by
-having to do so many security checks.  It also does not handle the
-situation where an unexpected exposure of a type occurs that has not
-been properly crippled.
-
-FreeBSD's jail system provides a protection scheme similar to this.
-Various system calls allow for basic usage, but knowing or having
-access to the system call is not enough to grant usage.  Every call to
-a system call requires checking that the proper rights have been
-granted to the use in order to allow for the system call to perform
-its action.
-
-An even better example in FreeBSD's jail system is its protection of
-sockets.  One can only bind a single IP address to a jail.  Any attempt
-to do more or perform uses with the one IP address that is granted is
-prevented.  The check is performed at every call involving the one
-granted IP address.
-
-Using 'file' as the example again, one could cripple the type so that
-instantiation is not possible for the type in Python.  One could also
-provide a permission check on each call to a unsafe method call and
-thus allow the type to be used in normal situations (such as type
-checking), but still feel safe that illegal operations are not
-performed.  Regardless of which approach you take, you do not need to
-worry about a reference to the type being exposed unexpectedly since
-the reference is not the security check but the actual method calls.
-
-
-Comparison of the Two Approaches
-================================
-
-From the perspective of Python, the two approaches differ on what would
-be the most difficult thing to analyze from a security standpoint: all
-of the ways to gain access to various types from a sandboxed
-interpreter with no imports, or finding all of the types that can lead
-to possibly dangerous actions and thus need to be crippled.
-
-Some Python developers, such as Armin Rigo, feel that truly hiding
-objects in Python is "quite hard" [#armin-hiding]_.  This sentiment
-means that making a pure capabilities system in Python that is secure
-is not possible as people would continue to find new ways to get a hold
-of the reference to a protected resource.
-
-Others feel that by not going the capabilities route we will be
-constantly chasing down new types that require crippling.  The thinking
-is that if we cannot control the references for 'file', how are we to
-know what other types might become exposed later on and thus require
-more crippling?
-
-It essentially comes down to what is harder to do: find all the ways to
-access the types in Python in a sandboxed interpreter with no imported
-modules, or to go through the Python code base and find all types that
-should be crippled?
-
-
-The 'rexec' Module
-///////////////////////////////////////
-
-The 'rexec' module [#rexec]_ was the original attempt at providing a
-sandbox environment for Python code to run in.  It's design was based
-on Safe-Tcl which was essentially a capabilities system [#safe-tcl]_.
-Safe-Tcl allowed you to launch a separate interpreter where its global
-functions were specified at creation time.  This prevented one from
-having any abilities that were not explicitly provided.
-
-For 'rexec', the Safe-Tcl model was tweaked to better match Python's
-situation.  An RExec object represented a sandboxed environment.
-Imports were checked against a whitelist of modules.  You could also
-restrict the type of modules to import based on whether they were
-Python source, bytecode, or C extensions.  Built-ins were allowed
-except for a blacklist of built-ins to not provide.  One could restrict
-whether stdin, stdout, and stderr were provided or not on a per-RExec
-basis.  Several other protections were provided; see documentation for
-the complete list.
-
-The ultimate undoing of the 'rexec' module was how access to objects
-that in normal Python require no imports to reach was handled.
-Importing modules requires a direct action, and thus can be protected
-against directly in the import machinery.  But for built-ins, they are
-accessible by default and require no direct action to access in normal
-Python; you just use their name since they are provided in all
-namespaces.
-
-For instance, in a sandboxed interpreter, one only had to 
-``del __builtins__`` to gain access to the full set of built-ins.
-Another way is through using the gc module:
-``gc.get_referrers(''.__class__.__bases__[0])[6]['file']``.  While both
-of these could be fixed (the former was a bug in 'rexec' that was fixed
-and the latter could be handled by not allowing 'gc' to be imported),
-they are examples of things that do not require proactive actions on
-the part of the programmer in normal Python to gain access to a
-resource.  This was an unfortunate side-effect of having all of that
-wonderful reflection in Python.
-
-There is also the issue that 'rexec' was written in Python which
-provides its own problems based on reflection and the ability to modify
-the code at run-time without security protection.
-
-Much has been learned since 'rexec' was written about how Python tends
-to be used and where security issues tend to appear.  Essentially
-Python's dynamic nature does not lend itself very well to a security
-implementation that does not require a constant checking of
-permissions.
-
-
-Threat Model
-///////////////////////////////////////
-
-Below is a list of what the security implementation assumes, along with
-what section of this document that addresses that part of the security
-model (if not already true in Python by default).  The term "bare" when
-in regards to an interpreter means an interpreter that has not
-performed a single import of a module.  Also, all comments refer to a
-sandboxed interpreter unless otherwise explicitly stated.
-
-This list does not address specifics such as how 'file' will be
-protected or whether memory should be protected.  This list is meant to
-make clear at a more basic level what the security model is assuming is
-true.
-
-* The Python interpreter itself is always trusted.
-    + Implemented by code that runs at the process level performing any
-      necessary security checks.
-* The Python interpreter cannot be crashed by valid Python source code
-  in a bare interpreter.
-* Python source code is always considered safe.
-* Python bytecode is always considered dangerous [`Hostile Bytecode`_].
-* C extension modules are inherently considered dangerous.
-  [`Extension Module Importation`_].
-    + Explicit trust of a C extension module is possible.
-* Built-in modules are considered dangerous.
-    + Explicit trust of a built-in module is possible.
-* Sandboxed interpreters running in the same process inherently cannot
-  communicate with each other.
-    + Communication through C extension modules is possible because of
-      the technical need to share extension module instances between
-      interpreters.
-* Sandboxed interpreters running in the same process inherently cannot
-  share objects.
-    + Sharing objects through C extension modules is possible because
-      of the technical need to share extension module instances between
-      interpreters.
-* When starting a sandboxed interpreter, it starts with a fresh
-  built-in and global namespace that is not shared with the interpreter
-  that started it.
-* Objects in the default built-in namespace should be safe to use
-  [`Reading/Writing Files`_, `Stdin, Stdout, and Stderr`_].
-    + Either hide the dangerous ones or cripple them so they can cause
-      no harm.
-
-There are also some features that might be desirable, but are not being
-addressed by this security model.
-
-* Communication in any direction between an unprotected interpreter and
-  a sandboxed interpreter it created.
-
-
-The Proposed Approach
-///////////////////////////////////////
-
-In light of where 'rexec' succeeded and failed along with what is known
-about the two main approaches to security and how Python tends to
-operate, the following is a proposal on how to secure Python for
-sandboxing.
-
-
-Implementation Details
-===============================
-
-Support for sandboxed interpreters will require a compilation flag.
-This allows the more common case of people not caring about protections
-to not take a performance hit.  And even when Python is compiled for
-sandboxed interpreter restrictions, when the running interpreter *is*
-unprotected, there will be no accidental triggers of protections.  This
-means that developers should be liberal with the security protections
-without worrying about there being issues for interpreters that do not
-need/want the protection.
-
-At the Python level, the __sandboxed__ built-in will be set based on
-whether the interpreter is sandboxed or not.  This will be set for
-*all* interpreters, regardless of whether sandboxed interpreter support
-was compiled in or not.
-
-For setting what is to be protected, the PyThreadState for the
-sandboxed interpreter must be passed in.  This makes the protection
-very explicit and helps make sure you set protections for the exact
-interpreter you mean to.  All functions that set protections begin with
-the prefix ``PySandbox_Set*()``.  These functions are meant to only
-work with sandboxed interpreters that have not been used yet to execute
-any Python code.  The calls must be made by the code creating and
-handling the sandboxed interpreter *before* the sandboxed interpreter
-is used to execute any Python code.
-
-The functions for checking for permissions are actually macros that
-take in at least an error return value for the function calling the
-macro.  This allows the macro to return on behalf of the caller if the
-check fails and cause the SandboxError exception to be propagated
-automatically.  This helps eliminate any coding errors from incorrectly
-checking a return value on a rights-checking function call.  For the
-rare case where this functionality is disliked, just make the check in
-a utility function and check that function's return value (but this is
-strongly discouraged!).
-
-Functions that check that an operation is allowed implicitly operate on
-the currently running interpreter as returned by
-``PyInterpreter_Get()`` and are to be used by any code (the
-interpreter, extension modules, etc.) that needs to check for
-permission to execute.  They have the common prefix of 
-`PySandbox_Allowed*()``.
-
-
-API
---------------
-
-* PyThreadState* PySandbox_NewInterpreter()
-    Return a new interpreter that is considered sandboxed.  There is no
-    corresponding ``PySandbox_EndInterpreter()`` as
-    ``Py_EndInterpreter()`` will be taught how to handle sandboxed
-    interpreters.  ``NULL`` is returned on error.
-
-* PySandbox_Allowed(error_return)
-    Macro that has the caller return with 'error_return' if the
-    interpreter is unprotected, otherwise do nothing.
-
-
-Memory
-=============================
-
-Protection
---------------
-
-A memory cap will be allowed.
-
-Modification to pymalloc will be needed to properly keep track of the
-allocation and freeing of memory.  Same goes for the macros around the
-system malloc/free system calls.  This provides a platform-independent
-system for protection of memory instead of relying on the operating
-system to provide a service for capping memory usage of a process.  It
-also allows the protection to be at the interpreter level instead of at
-the process level.
-
-Existing APIs to protect:
-- _PyObject_New()
-    protected directly
-- _PyObject_NewVar()
-    protected directly
-- _PyObject_Del()
-    remove macro that uses PyObject_Free() and protect directly
-- PyObject_New()
-    implicitly by macro using _PyObject_New()
-- PyObject_NewVar()
-    implicitly by macro using _PyObject_NewVar()
-- PyObject_Del()
-    redefine macro to use _PyObject_Del() instead of PyObject_Free()
-- PyMem_Malloc()
-    protected directly
-- PyMem_Realloc()
-    protected directly
-- PyMem_Free()
-    protected directly
-- PyMem_New()
-    implicitly protected by macro using PyMem_Malloc()
-- PyMem_Resize()
-    implicitly protected by macro using PyMem_Realloc()
-- PyMem_Del()
-    implicitly protected by macro using PyMem_Free()
-- PyMem_MALLOC()
-    redefine macro to use PyMem_Malloc()
-- PyMem_REALLOC()
-    redefine macro to use PyMem_Realloc()
-- PyMem_FREE()
-    redefine macro to use PyMem_Free()
-- PyMem_NEW()
-    implicitly protected by macro using PyMem_MALLOC()
-- PyMem_RESIZE()
-    implicitly protected by macro using PyMem_REALLOC()
-- PyMem_DEL()
-    implicitly protected by macro using PyMem_FREE()
-- PyObject_Malloc()
-    XXX
-- PyObject_Realloc()
-    XXX
-- PyObject_Free()
-    XXX
-- PyObject_MALLOC()
-    XXX
-- PyObject_REALLOC()
-    XXX
-- PyObject_FREE()
-    XXX
-
-
-Why
---------------
-
-Protecting excessive memory usage allows one to make sure that a DoS
-attack against the system's memory is prevented.
-
-
-Possible Security Flaws
------------------------
-
-If code makes direct calls to malloc/free instead of using the proper
-``PyMem_*()``
-macros then the security check will be circumvented.  But C code is
-*supposed* to use the proper macros or pymalloc and thus this issue is
-not with the security model but with code not following Python coding
-standards.
-
-
-API
---------------
-
-* int PySandbox_SetMemoryCap(PyThreadState *, integer)
-    Set the memory cap for an sandboxed interpreter.  If the
-    interpreter is not running an sandboxed interpreter, return a false
-    value.
-
-* PySandbox_AllowedMemoryAlloc(integer, error_return)
-    Macro to increase the amount of memory that is reported that the
-    running sandboxed interpreter is using.  If the increase puts the
-    total count passed the set limit or leads to integer overflow in
-    the allocation count, raise an SandboxError exception
-    and cause the calling function to return with the value of
-    'error_return', otherwise do nothing.
-
-* void PySandbox_AllowedMemoryFree(integer)
-    Decrease the current running interpreter's allocated
-    memory.  If this puts the memory used to below 0, re-set it to 0.
-
-
-Reading/Writing Files
-=============================
-
-Protection
---------------
-
-XXX
-
-To open a file, one will have to use open().  This will make open() a
-factory function that controls reference access to the 'file' type in
-terms of creating new instances.  When an attempted file opening fails
-(either because the path does not exist or of security reasons),
-SandboxError will be raised.  The same exception must be raised to
-prevent filesystem information being gleaned from the type of exception
-returned (i.e., returning IOError if a path does not exist tells the
-user something about that file path).
-
-What open() returns may not be an instance of 'file' but a proxy that
-provides the security measures needed.  While this might break code
-that uses type checking to make sure a 'file' object is used, taking a
-duck typing approach would be better.  This is not only more Pythonic
-but would also allow the code to use a StringIO instance.
-
-It has been suggested to allow for a passed-in callback to be called
-when a specific path is to be opened.  While this provides good
-flexibility in terms of allowing custom proxies with more fine-grained
-security (e.g., capping the amount of disk write), this has been deemed
-unneeded in the initial security model and thus is not being considered
-at this time.  
-
-Why
---------------
-
-Allowing anyone to be able to arbitrarily read, write, or learn about
-the layout of your filesystem is extremely dangerous.  It can lead to
-loss of data or data being exposed to people whom should not have
-access.
-
-
-Possible Security Flaws
------------------------
-
-XXX
-
-
-API
---------------
-
-* int PySandbox_SetAllowedFile(PyThreadState *, string path,
-                                string mode)
-    Add a file that is allowed to be opened in 'mode' by the 'file'
-    object.  If the interpreter is not sandboxed then return a false
-    value.
-
-* PySandbox_AllowedPath(string path, string mode, error_return)
-    Macro that causes the caller to return with 'error_return' and
-    raise SandboxError as the exception if the specified path with
-    'mode' is not allowed, otherwise do nothing.
-
-
-Extension Module Importation
-============================
-
-Protection
---------------
-
-A whitelist of extension modules that may be imported must be provided.
-A default set is given for stdlib modules known to be safe.
-
-A check in the import machinery will check that a specified module name
-is allowed based on the type of module (Python source, Python bytecode,
-or extension module).  Python bytecode files are never directly
-imported because of the possibility of hostile bytecode being present.
-Python source is always considered safe based on the assumption that
-all resource harm is eventually done at the C level, thus Python source
-code directly cannot cause harm without help of C extension modules.
-Thus only C extension modules need to be checked against the whitelist.
-
-The requested extension module name is checked in order to make sure
-that it is on the whitelist if it is a C extension module.  If the name
-is not correct a SandboxError exception is raised.  Otherwise the
-import is allowed.  
-
-Even if a Python source code module imports a C extension module in an
-unprotected interpreter it is not a problem since the Python source
-code module is reloaded in the sandboxed interpreter.  When that Python
-source module is freshly imported the normal import check will be
-triggered to prevent the C extension module from becoming available to
-the sandboxed interpreter.
-
-For the 'os' module, a special sandboxed version will be used if the
-proper C extension module providing the correct abilities is not
-allowed.  This will default to '/' as the path separator and provide as
-much reasonable abilities as possible from a pure Python module.
-
-The 'sys' module is specially addressed in
-`Changing the Behaviour of the Interpreter`_.
-
-By default, the whitelisted modules are:
-
-* XXX
-
-
-Why
---------------
-
-Because C code is considered unsafe, its use should be regulated.  By
-using a whitelist it allows one to explicitly decide that a C extension
-module is considered safe.  
-
-
-Possible Security Flaws
------------------------
-
-If a whitelisted C extension module imports a non-whitelisted C
-extension module and makes it an attribute of the whitelisted module
-there will be a breach in security.  Luckily this a rarity in
-extension modules.  
-
-There is also the issue of a C extension module calling the C API of a
-non-whitelisted C extension module.
-
-Lastly, if a whitelisted C extension module is loaded in an unprotected
-interpreter and then loaded into a sandboxed interpreter then there is
-no checks during module initialization for possible security issues in
-the sandboxed interpreter that would have occurred had the sandboxed
-interpreter done the initial import.
-
-All of these issues can be handled by never blindly whitelisting a C
-extension module.  Added support for dealing with C extension modules
-comes in the form of `Extension Module Crippling`_.  
-
-
-API
---------------
-
-* int PySandbox_SetModule(PyThreadState *, string module_name)
-    Allow the sandboxed interpreter to import 'module_name'.  If the
-    interpreter is not sandboxed, return a false value.  Absolute
-    import paths must be specified.
-
-* int PySandbox_BlockModule(PyThreadState *, string module_name)
-    Remove the specified module from the whitelist.  Used to remove
-    modules that are allowed by default.  Return a false value if
-    called on an unprotected interpreter.
-
-* PySandbox_AllowedModule(string module_name, error_return)
-    Macro that causes the caller to return with 'error_return' and sets
-    the exception SandboxError if the specified module cannot be
-    imported, otherwise does nothing.
-
-
-Extension Module Crippling
-==========================
-
-Protection
---------------
-
-By providing a C API for checking for allowed abilities, modules that
-have some useful functionality can do proper security checks for those
-functions that could provide insecure abilities while allowing safe
-code to be used (and thus not fully deny importation).
-
-
-Why
---------------
-
-Consider a module that provides a string processing ability.  If that
-module provides a single convenience function that reads its input
-string from a file (with a specified path), the whole module should not
-be blocked from being used, just that convenience function.  By
-whitelisting the module but having a security check on the one problem
-function, the user can still gain access to the safe functions.  Even
-better, the unsafe function can be allowed if the security checks pass.
-
-
-Possible Security Flaws
------------------------
-
-If a C extension module developer incorrectly implements the security
-checks for the unsafe functions it could lead to undesired abilities.
-
-
-API
---------------
-
-Use PySandbox_Allowed() to protect unsafe code from being executed.
-
-
-Hostile Bytecode
-=============================
-
-Protection
---------------
-
-XXX
-
-
-Why
---------------
-
-Without implementing a bytecode verification tool, there is no way of
-making sure that bytecode does not jump outside its bounds, thus
-possibly executing malicious code.  It also presents the possibility of
-crashing the interpreter.
-
-
-Possible Security Flaws
------------------------
-
-None known.
-
-
-API
---------------
-
-N/A
-
-
-Changing the Behaviour of the Interpreter
-=========================================
-
-Protection
---------------
-
-Only a subset of the 'sys' module will be made available to sandboxed
-interpreters.  Things to allow from the sys module:
-
-* byteorder (?)
-* copyright 
-* displayhook
-* excepthook
-* __displayhook__
-* __excepthook__
-* exc_info
-* exc_clear
-* exit
-* getdefaultencoding
-* _getframe (?)
-* hexversion
-* last_type
-* last_value
-* last_traceback
-* maxint (?)
-* maxunicode (?)
-* modules
-* stdin  # See `Stdin, Stdout, and Stderr`_.
-* stdout
-* stderr
-* version
-
-
-Why
---------------
-
-Filesystem information must be removed.  Any settings that could
-possibly lead to a DoS attack (e.g., sys.setrecursionlimit()) or risk
-crashing the interpreter must also be removed.
-
-
-Possible Security Flaws
------------------------
-
-Exposing something that could lead to future security problems (e.g., a
-way to crash the interpreter).
-
-
-API
---------------
-
-None.
-
-
-Socket Usage
-=============================
-
-Protection
---------------
-
-Allow sending and receiving data to/from specific IP addresses on
-specific ports.
-
-open() is to be used as a factory function to open a network
-connection.  If the connection is not possible (either because of an
-invalid address or security reasons), SandboxError is raised.
-
-A socket object may not be returned by the call.  A proxy to handle
-security might be returned instead.
-
-XXX
-
-
-Why
---------------
-
-Allowing arbitrary sending of data over sockets can lead to DoS attacks
-on the network and other machines.  Limiting accepting data prevents
-your machine from being attacked by accepting malicious network
-connections.  It also allows you to know exactly where communication is
-going to and coming from.
-
-
-Possible Security Flaws
------------------------
-
-If someone managed to influence the used DNS server to influence what
-IP addresses were used after a DNS lookup.
-
-
-API
---------------
-
-* int PySandbox_SetIPAddress(PyThreadState *, string IP, integer port)
-    Allow the sandboxed interpreter to send/receive to the specified
-    'IP' address on the specified 'port'.  If the interpreter is not
-    sandboxed, return a false value.
-
-* PySandbox_AllowedIPAddress(string IP, integer port, error_return)
-    Macro to verify that the specified 'IP' address on the specified
-    'port' is allowed to be communicated with.  If not, cause the
-    caller to return with 'error_return' and SandboxError exception
-    set, otherwise do nothing.
-
-* int PySandbox_SetHost(PyThreadState *, string host, integer port)
-    Allow the sandboxed interpreter to send/receive to the specified
-    'host' on the specified 'port'.  If the interpreter is not
-    sandboxed, return a false value.
-
-* PySandbox_AllowedHost(string host, integer port, error_return)
-    Check that the specified 'host' on the specified 'port' is allowed
-    to be communicated with.  If not, set a SandboxError exception and
-    cause the caller to return 'error_return', otherwise do nothing.
-
-
-Network Information
-=============================
-
-Protection
---------------
-
-Limit what information can be gleaned about the network the system is
-running on.  This does not include restricting information on IP
-addresses and hosts that are have been explicitly allowed for the
-sandboxed interpreter to communicate with.
-
-XXX
-
-
-Why
---------------
-
-With enough information from the network several things could occur.
-One is that someone could possibly figure out where your machine is on
-the Internet.  Another is that enough information about the network you
-are connected to could be used against it in an attack.
-
-
-Possible Security Flaws
------------------------
-
-As long as usage is restricted to only what is needed to work with
-allowed addresses, there are no security issues to speak of.
-
-
-API
---------------
-
-* int PySandbox_SetNetworkInfo(PyThreadState *)
-    Allow the sandboxed interpreter to get network information
-    regardless of whether the IP or host address is explicitly allowed.
-    If the interpreter is not sandboxed, return a false value.
-
-* PySandbox_AllowedNetworkInfo(error_return)
-    Macro that will return 'error_return' for the caller and set a
-    SandboxError exception if the sandboxed interpreter does not allow
-    checking for arbitrary network information, otherwise do nothing.
-
-
-Filesystem Information
-=============================
-
-Protection
---------------
-
-Do not allow information about the filesystem layout from various parts
-of Python to be exposed.  This means blocking exposure at the Python
-level to:
-
-* __file__ attribute on modules
-* __path__ attribute on packages
-* co_filename attribute on code objects
-* XXX
-
-
-Why
---------------
-
-Exposing information about the filesystem is not allowed.  You can
-figure out what operating system one is on which can lead to
-vulnerabilities specific to that operating system being exploited.
-
-
-Possible Security Flaws
------------------------
-
-Not finding every single place where a file path is exposed.
-
-
-API
---------------
-
-* int PySandbox_SetFilesystemInfo(PyThreadState *)
-    Allow the sandboxed interpreter to expose filesystem information.
-    If the passed-in interpreter is not sandboxed, return NULL.
-
-* PySandbox_AllowedFilesystemInfo(error_return)
-    Macro that checks if exposing filesystem information is allowed.
-    If it is not, cause the caller to return with the value of
-    'error_return' and raise SandboxError, otherwise do nothing.
-
-
-Stdin, Stdout, and Stderr
-=============================
-
-Protection
---------------
-
-By default, sys.__stdin__, sys.__stdout__, and sys.__stderr__ will be
-set to instances of StringIO.  Explicit allowance of the process'
-stdin, stdout, and stderr is possible.
-
-This will protect the 'print' statement, and the built-ins input() and
-raw_input().
-
-
-Why
---------------
-
-Interference with stdin, stdout, or stderr should not be allowed unless
-desired.  No one wants uncontrolled output sent to their screen.
-
-
-Possible Security Flaws
------------------------
-
-Unless StringIO instances can be used maliciously, none to speak of.
-
-
-API
---------------
-
-* int PySandbox_SetTrueStdin(PyThreadState *)
-  int PySandbox_SetTrueStdout(PyThreadState *)
-  int PySandbox_SetTrueStderr(PyThreadState *)
-    Set the specific stream for the interpreter to the true version of
-    the stream and not to the default instance of StringIO.  If the
-    interpreter is not sandboxed, return a false value.
-
-
-Adding New Protections
-=============================
-
-.. note:: This feature has the lowest priority and thus will be the
-          last feature implemented (if ever).
-
-Protection
---------------
-
-Allow for extensibility in the security model by being able to add new
-types of checks.  This allows not only for Python to add new security
-protections in a backwards-compatible fashion, but to also have
-extension modules add their own as well.
-
-An extension module can introduce a group for its various values to
-check, with a type being a specific value within a group.  The "Python"
-group is specifically reserved for use by the Python core itself.
-
-
-Why
---------------
-
-We are all human.  There is the possibility that a need for a new type
-of protection for the interpreter will present itself and thus need
-support.  By providing an extensible way to add new protections it
-helps to future-proof the system.
-
-It also allows extension modules to present their own set of security
-protections.  That way one extension module can use the protection
-scheme presented by another that it is dependent upon.
-
-
-Possible Security Flaws
-------------------------
-
-Poor definitions by extension module users of how their protections
-should be used would allow for possible exploitation.
-
-
-API
---------------
-
-+ Bool
-    * int PySandbox_SetExtendedFlag(PyThreadState *, string group,
-                                    string type)
-        Set a group-type to be true.  Expected use is for when a binary
-        possibility of something is needed and that the default is to
-        not allow use of the resource (e.g., network information).
-        Returns a false value if used on an unprotected interpreter.
-
-    * PySandbox_AllowedExtendedFlag(string group, string type,
-                                    error_return)
-        Macro that if the group-type is not set to true, cause the
-        caller to return with 'error_return' with SandboxError
-        exception raised.  For unprotected interpreters the check does
-        nothing.
-
-+ Numeric Range
-    * int PySandbox_SetExtendedCap(PyThreadState *, string group,
-                                    string type, integer cap)
-        Set a group-type to a capped value, 'cap', with the initial
-        allocated value set to 0.  Expected use is when a resource has
-        a capped amount of use (e.g., memory).  Returns a false value
-        if the interpreter is not sandboxed.
-
-    * PySandbox_AllowedExtendedAlloc(integer increase, error_return)
-        Macro to raise the amount of a resource is used by 'increase'.
-        If the increase pushes the resource allocation past the set
-        cap, then return 'error_return' and set SandboxError as the
-        exception, otherwise do nothing.
-
-    * PySandbox_AllowedExtendedFree(integer decrease, error_return)
-        Macro to lower the amount a resource is used by 'decrease'.  If
-        the decrease pushes the allotment to below 0 then have the
-        caller return 'error_return' and set SandboxError as the
-        exception, otherwise do nothing.
-
-
-+ Membership
-    * int PySandbox_SetExtendedMembership(PyThreadState *,
-                                            string group, string type,
-                                            string member)
-        Add a string, 'member',  to be considered a member of a
-        group-type (e.g., allowed file paths).  If the interpreter is not
-        an sandboxed interpreter, return a false value.
-
-    * PySandbox_AllowedExtendedMembership(string group, string type,
-                                            string member,
-                                            error_return)
-        Macro that checks 'member' is a member of the values set for
-        the group-type.  If it is not, then have the caller return
-        'error_return' and set an exception for SandboxError, otherwise
-        does nothing.
-
-+ Specific Value
-    * int PySandbox_SetExtendedValue(PyThreadState *, string group,
-                                        string type, string value)
-        Set a group-type to 'value'.  If the interpreter is not
-        sandboxed, return NULL.
-
-    * PySandbox_AllowedExtendedValue(string group, string type,
-                                        string value, error_return)
-        Macro to check that the group-type is set to 'value'.  If it is
-        not, then have the caller return 'error_return' and set an
-        exception for SandboxError, otherwise do nothing.
-
-
-Python API
-=============================
-
-__sandboxed__
---------------
-
-A built-in that flags whether the interpreter currently running is
-sandboxed or not.  Set to a 'bool' value that is read-only.  To mimic
-working of __debug__.
-
-
-sandbox module
---------------
-
-XXX
-
-
-References
-///////////////////////////////////////
-
-.. [#rexec] The 'rexec' module
-   (http://docs.python.org/lib/module-rexec.html)
-
-.. [#safe-tcl] The Safe-Tcl Security Model
-   (http://research.sun.com/technical-reports/1997/abstract-60.html)
-
-.. [#ctypes] 'ctypes' module
-   (http://docs.python.org/dev/lib/module-ctypes.html)
-
-.. [#paradigm regained] "Paradigm Regained:
-                         Abstraction Mechanisms for Access Control"
-   (http://erights.org/talks/asian03/paradigm-revised.pdf)
-
-.. [#armin-hiding] [Python-Dev] what can we do to hide the 'file' type?
-   (http://mail.python.org/pipermail/python-dev/2006-July/067076.html)

Copied: python/branches/bcannon-sandboxing/securing_python.txt (from r50656, python/branches/bcannon-sandboxing/sandboxing_design_doc.txt)
==============================================================================
--- python/branches/bcannon-sandboxing/sandboxing_design_doc.txt	(original)
+++ python/branches/bcannon-sandboxing/securing_python.txt	Wed Jul 19 03:52:10 2006
@@ -1,1195 +1,467 @@
-Restricted Execution for Python
-#######################################
+Securing Python
+#####################################################################
 
-About This Document
-=============================
-
-This document is meant to lay out the general design for re-introducing
-a sandboxing model for Python.  This document should provide one with
-enough information to understand the goals for sandboxing, what
-considerations were made for the design, and the actual design itself.
-Design decisions should be clear and explain not only why they were
-chosen but possible drawbacks from taking a specific approach.
-
-If any of the above is found not to be true, please email me at
-brett at python.org and let me know what problems you are having with the
-document.
-
-
-XXX TO DO
-=============================
-
-Design
---------------
-
-* threading needs protection?
-* python-dev convince me that hiding 'file' possible?
-    + based on that, handle code objects
-    + also decide how to handle sockets
-    + perhaps go with crippling but try best effort on hiding reference and if
-      best effort holds up eventually shift over to capabilities system
-* resolve to IP at call time to prevent DNS man-in-the-middle attacks when
-  allowing a specific host name?
-* what network info functions are allowed by default?
-* does the object.__subclasses__() trick work across interpreters, or is it
-  unique per interpreter?
-* figure out default whitelist of extension modules
-* check default accessible objects for file path exposure
-* helper functions to get at StringIO instances for stdin, stdout, and friends?
-* decide on what type of objects (e.g., PyStringObject or const char *) are to
-  be passed in
-* all built-ins properly protected?
-* exactly how to tell whether argument to open() is a path, IP, or host name
-  (third argument, 'n' prefix for networking, format of path, ...)
-* API at the Python level
-* for extension module protection, allow for wildcard allowance
-  (e.g., ``xml.*``)
-
-
-Implementation
---------------
-
-* add __sandbox__
-* merge from HEAD
-    + last merge on rev. 47248
-* remove bare malloc()/realloc()/free() uses
-    + also watch out for PyObject_Malloc()/PyObject_MALLOC() calls
-* note in SpecialBuilds.txt
-
-
-Goal
-=============================
-
-A good sandboxing model provides enough protection to prevent malicious
-harm to come to the system, and no more.  Barriers should be minimized
-so as to allow most code that does not do anything that would be
-regarded as harmful to run unmodified.  But the protections need to be
-thorough enough to prevent any unintended changes or information of the
-system to come about.
-
-An important point to take into consideration when reading this
-document is to realize it is part of my (Brett Cannon's) Ph.D.
-dissertation.  This means it is heavily geared toward sandboxing when
-the interpreter is working with Python code embedded in a web page as
-viewed in Firefox.  While great strides have been taken to keep the
-design general enough so as to allow all previous uses of the 'rexec'
-module [#rexec]_ to be able to use the new design, it is not the
-focused goal.  This means if a design decision must be made for the
-embedded use case compared to sandboxing Python code in a pure Python
-application, the former will win out over the latter.
-
-Throughout this document, the term "resource" is used to represent
-anything that deserves possible protection.  This includes things that
-have a physical representation (e.g., memory) to things that are more
-abstract and specific to the interpreter (e.g., sys.path).
-
-When referring to the state of an interpreter, it is either
-"unprotected" or "sandboxed".  A unprotected interpreter has no
-restrictions imposed upon any resource.  A sandboxed interpreter has at
-least one, possibly more, resource with restrictions placed upon it to
-prevent unsafe code  that is running within the interpreter to cause
-harm to the system.
-
-
-.. contents::
-
-
-Use Cases
-/////////////////////////////
-
-All use cases are based on how many sandboxed interpreters are running
-in a single process and whether an unprotected interpreter is also
-running.  The use cases can be broken down into two categories: when
-the interpreter is embedded and only using sandboxed interpreters, and
-when pure Python code is running in an unprotected interpreter and uses
-sandboxed interpreters.
-
-
-When the Interpreter Is Embedded
-================================
-
-Single Sandboxed Interpreter
-----------------------------
-
-This use case is when an application embeds the interpreter and never
-has more than one interpreter running which happens to be sandboxed.
-
-
-Multiple Sandboxed Interpreters
--------------------------------
-
-When multiple interpreters, all sandboxed at varying levels, need to be
-running within a single application.  This is the key use case that
-this proposed design is targeted for.
-
-
-Stand-Alone Python
-=============================
-
-When someone has written a Python program that wants to execute Python
-code in an sandboxed interpreter(s).  This is the use case that 'rexec'
-attempted to fulfill.
-
-
-Issues to Consider
-=============================
-
-Common to all use cases, resources that the interpreter requires to
-function at a level below user code cannot be exposed to a sandboxed
-interpreter.  For instance, the interpreter might need to stat a file
-to see if it is possible to import.  If the ability to stat a file is
-not allowed to a sandboxed interpreter, it should not be allowed to
-perform that action, regardless of whether the interpreter at a level
-below user code needs that ability.
-
-When multiple interpreters are involved (sandboxed or not), not
-allowing an interpreter to gain access to resources available in other
-interpreters without explicit permission must be enforced.
-
-
-Resources to Protect
-/////////////////////////////
-
-It is important to make sure that the proper resources are protected
-from a sandboxed interpreter.  If you don't there is no point to sandboxing.
-
-Filesystem
-===================
-
-All facets of the filesystem must be protected.  This means restricting
-reading and writing to the filesystem (e.g., files, directories, etc.).
-It should be allowed in controlled situations where allowing access to
-the filesystem is desirable, but that should be an explicit allowance.
-
-There must also be protection to prevent revealing any information
-about the filesystem.  Disclosing information on the filesystem could
-allow one to infer what OS the interpreter is running on, for instance.
-
-
-Memory
-===================
-
-Memory should be protected.  It is a limited resource on the system
-that can have an impact on other running programs if it is exhausted.
-Being able to restrict the use of memory would help alleviate issues
-from denial-of-service (DoS) attacks on the system.
-
-
-Networking
-===================
-
-Networking is somewhat like the filesystem in terms of wanting similar
-protections.  You do not want to let unsafe code make socket
-connections unhindered or accept them to do possibly nefarious things.
-You also want to prevent finding out information about the network your
-are connected to.
-
-
-Interpreter
-===================
-
-One must make sure that the interpreter is not harmed in any way from
-sandboxed code.  This usually takes the form of crashing the program
-that the interpreter is embedded in or the unprotected interpreter that
-started the sandbox interpreter.  Executing hostile bytecode that might
-lead to undesirable effects is another possible issue.
-
-There is also the issue of taking it over.  One should not able to gain
-escalated privileges in any way without explicit permission.
-
-
-Types of Security
+Introduction
 ///////////////////////////////////////
 
-As with most things, there are multiple approaches one can take to
-tackle a problem.  Security is no exception.  In general there seem to
-be two approaches to protecting resources.
-
-
-Resource Hiding
-=============================
-
-By never giving code a chance to access a resource, you prevent it from
-being (ab)used.  This is the idea behind resource hiding; you can't
-misuse something you don't have in the first place.
-
-The most common implementation of resource hiding is capabilities.  In
-this type of system a resource's reference acts as a ticket that
-represents the right to use the resource.  Once code has a reference it
-is considered to have full use of resource that reference represents
-and no further security checks are directly performed (using delegates
-and other structured ways one can actually have a security check for
-each access of a resource, but this is not a default behaviour).
-
-As an example, consider the 'file' type as a resource we want to
-protect.  That would mean that we did not want a reference to the
-'file' type to ever be accessible without explicit permission.  If one
-wanted to provide read-only access to a temp file, you could have
-open() perform a check on the permissions of the current interpreter,
-and if it is allowed to, return a proxy object for the file that only
-allows reading from it.  The 'file' instance for the proxy would need
-to be properly hidden so that the reference was not reachable from
-outside so that 'file' access could still be controlled.
-
-Python, as it stands now, unfortunately does not work well for a pure
-capabilities system.  Capabilities require the prohibition of certain
-abilities, such as "direct access to another's private state"
-[#paradigm regained]_.  This obviously is not possible in Python since,
-at least at the Python level, there is no such thing as private state
-that is persistent (one could argue that local variables that are not
-cell variables for lexical scopes are private, but since they do not
-survive after a function call they are not usable for keeping
-persistent state).  One can hide references at the C level by storing
-it in the struct for the instance of a type and not providing a
-function to access that attribute.
-
-Python's introspection abilities also do not help make implementing
-capabilities that much easier.  Consider how one could access 'file'
-even when it is deleted from __builtin__.  You can still get to the
-reference for 'file' through the sequence returned by
-``object.__subclasses__()``.
-
-
-Resource Crippling
-=============================
-
-Another approach to security is to not worry about controlling access
-to the reference of a resource.  One can have a resource perform a
-security check every time someone tries to use a method on that
-resource.  This pushes the security check to a lower level; from a
-reference level to the method level.
-
-By performing the security check every time a resource's method is
-called the worry of a specific resource's reference leaking out to
-insecure code is alleviated.  This does add extra overhead, though, by
-having to do so many security checks.  It also does not handle the
-situation where an unexpected exposure of a type occurs that has not
-been properly crippled.
-
-FreeBSD's jail system provides a protection scheme similar to this.
-Various system calls allow for basic usage, but knowing or having
-access to the system call is not enough to grant usage.  Every call to
-a system call requires checking that the proper rights have been
-granted to the use in order to allow for the system call to perform
-its action.
-
-An even better example in FreeBSD's jail system is its protection of
-sockets.  One can only bind a single IP address to a jail.  Any attempt
-to do more or perform uses with the one IP address that is granted is
-prevented.  The check is performed at every call involving the one
-granted IP address.
-
-Using 'file' as the example again, one could cripple the type so that
-instantiation is not possible for the type in Python.  One could also
-provide a permission check on each call to a unsafe method call and
-thus allow the type to be used in normal situations (such as type
-checking), but still feel safe that illegal operations are not
-performed.  Regardless of which approach you take, you do not need to
-worry about a reference to the type being exposed unexpectedly since
-the reference is not the security check but the actual method calls.
-
-
-Comparison of the Two Approaches
-================================
-
-From the perspective of Python, the two approaches differ on what would
-be the most difficult thing to analyze from a security standpoint: all
-of the ways to gain access to various types from a sandboxed
-interpreter with no imports, or finding all of the types that can lead
-to possibly dangerous actions and thus need to be crippled.
-
-Some Python developers, such as Armin Rigo, feel that truly hiding
-objects in Python is "quite hard" [#armin-hiding]_.  This sentiment
-means that making a pure capabilities system in Python that is secure
-is not possible as people would continue to find new ways to get a hold
-of the reference to a protected resource.
-
-Others feel that by not going the capabilities route we will be
-constantly chasing down new types that require crippling.  The thinking
-is that if we cannot control the references for 'file', how are we to
-know what other types might become exposed later on and thus require
-more crippling?
-
-It essentially comes down to what is harder to do: find all the ways to
-access the types in Python in a sandboxed interpreter with no imported
-modules, or to go through the Python code base and find all types that
-should be crippled?
+As of Python 2.5, the Python does not support any form of security
+model for
+executing arbitrary Python code in some form of protected interpreter.
+While one can use such things as ``exec`` and ``eval`` to garner a
+very weak form of sandboxing, it does not provide any thorough
+protections from malicious code.
+
+This should be rectified.  This document attempts to lay out what
+would be needed to secure Python in such a way as to allow arbitrary
+Python code to execute in a sandboxed interpreter without worries of
+that interpreter providing access to any resource of the operating
+system without being given explicit authority to do so.
+
+Throughout this document several terms are going to be used.  A
+"sandboxed interpreter" is one where the built-in namespace is not the
+same as that of an interpreter whose built-ins were unaltered, which
+is called an "unprotected interpreter".
+
+A "bare interpreter" is one where the built-in namespace has been
+stripped down the bare minimum needed to run any form of basic Python
+program.  This means that all atomic types (i.e., syntactically
+supported types), ``object``, and the exceptions provided by the
+``exceptions`` module are considered in the built-in namespace.  There
+have also been no imports executed in the interpreter.
 
 
-The 'rexec' Module
+Rationale
 ///////////////////////////////////////
 
-The 'rexec' module [#rexec]_ was the original attempt at providing a
-sandbox environment for Python code to run in.  It's design was based
-on Safe-Tcl which was essentially a capabilities system [#safe-tcl]_.
-Safe-Tcl allowed you to launch a separate interpreter where its global
-functions were specified at creation time.  This prevented one from
-having any abilities that were not explicitly provided.
-
-For 'rexec', the Safe-Tcl model was tweaked to better match Python's
-situation.  An RExec object represented a sandboxed environment.
-Imports were checked against a whitelist of modules.  You could also
-restrict the type of modules to import based on whether they were
-Python source, bytecode, or C extensions.  Built-ins were allowed
-except for a blacklist of built-ins to not provide.  One could restrict
-whether stdin, stdout, and stderr were provided or not on a per-RExec
-basis.  Several other protections were provided; see documentation for
-the complete list.
-
-The ultimate undoing of the 'rexec' module was how access to objects
-that in normal Python require no imports to reach was handled.
-Importing modules requires a direct action, and thus can be protected
-against directly in the import machinery.  But for built-ins, they are
-accessible by default and require no direct action to access in normal
-Python; you just use their name since they are provided in all
-namespaces.
-
-For instance, in a sandboxed interpreter, one only had to 
-``del __builtins__`` to gain access to the full set of built-ins.
-Another way is through using the gc module:
-``gc.get_referrers(''.__class__.__bases__[0])[6]['file']``.  While both
-of these could be fixed (the former was a bug in 'rexec' that was fixed
-and the latter could be handled by not allowing 'gc' to be imported),
-they are examples of things that do not require proactive actions on
-the part of the programmer in normal Python to gain access to a
-resource.  This was an unfortunate side-effect of having all of that
-wonderful reflection in Python.
-
-There is also the issue that 'rexec' was written in Python which
-provides its own problems based on reflection and the ability to modify
-the code at run-time without security protection.
-
-Much has been learned since 'rexec' was written about how Python tends
-to be used and where security issues tend to appear.  Essentially
-Python's dynamic nature does not lend itself very well to a security
-implementation that does not require a constant checking of
-permissions.
+Python is used extensively as an embedded language within existing
+programs.  These applications often times need to provide the
+functionality of allowing users to run Python code written by someone
+else where they can trust that no unintentional harm will come to
+their system regardless of their trust of the code they are executing.
+
+For instance, think of an application that supports a plug-in system
+with Python as the language used for writing plug-ins.  You do not
+want to have to examine every plug-in you download to make sure that
+it does not alter your filesystem if you can help it.  With a proper
+security model and implementation in place this hinderance of having
+to examine all code you execute should be alleviated.
 
 
-Threat Model
+Approaches to Security
 ///////////////////////////////////////
 
-Below is a list of what the security implementation assumes, along with
-what section of this document that addresses that part of the security
-model (if not already true in Python by default).  The term "bare" when
-in regards to an interpreter means an interpreter that has not
-performed a single import of a module.  Also, all comments refer to a
-sandboxed interpreter unless otherwise explicitly stated.
-
-This list does not address specifics such as how 'file' will be
-protected or whether memory should be protected.  This list is meant to
-make clear at a more basic level what the security model is assuming is
-true.
-
-* The Python interpreter itself is always trusted.
-    + Implemented by code that runs at the process level performing any
-      necessary security checks.
-* The Python interpreter cannot be crashed by valid Python source code
-  in a bare interpreter.
-* Python source code is always considered safe.
-* Python bytecode is always considered dangerous [`Hostile Bytecode`_].
-* C extension modules are inherently considered dangerous.
-  [`Extension Module Importation`_].
-    + Explicit trust of a C extension module is possible.
-* Built-in modules are considered dangerous.
-    + Explicit trust of a built-in module is possible.
-* Sandboxed interpreters running in the same process inherently cannot
-  communicate with each other.
-    + Communication through C extension modules is possible because of
-      the technical need to share extension module instances between
-      interpreters.
-* Sandboxed interpreters running in the same process inherently cannot
-  share objects.
-    + Sharing objects through C extension modules is possible because
-      of the technical need to share extension module instances between
-      interpreters.
-* When starting a sandboxed interpreter, it starts with a fresh
-  built-in and global namespace that is not shared with the interpreter
-  that started it.
-* Objects in the default built-in namespace should be safe to use
-  [`Reading/Writing Files`_, `Stdin, Stdout, and Stderr`_].
-    + Either hide the dangerous ones or cripple them so they can cause
-      no harm.
+There are essentially two types of security: who-I-am
+(permissions-based) security and what-I-have (authority-based)
+security.
+
+Who-I-Am Security
+========================
+
+With who-I-am security (a.k.a., permissions-based security), the
+ability to use a resource requires providing who you are, validating
+you are allowed to access the resource you are requesting, and then
+performing the requested action on the resource.
+
+The ACL security system on most UNIX filesystems is who-I-am security.
+When you want to open a file, say ``/etc/passwd``, you make the
+function call to open the file.  Within that function, it fetchs
+the ACL for the file, finds out who the caller is, checks to see if
+the caller is on the ACL for opening the file, and then proceeds to
+either deny access or return an open file object.
+
+
+What-I-Have Security
+========================
+
+A contrast to who-I-am security, what-I-have security never requires
+knowing who is requesting a resource.  By never providing a function
+to access a resource or by creating a proxy that wraps the function to
+access a resource with argument checking, you can skip the need to
+know who is making a call.
+
+Using our file example, the program trying to open a file is given a
+proxy that checks whether paths passed into the function match allowed
+based at the creation time of the proxy before using the full-featured
+open function to open the file.
+
+This illustrates a subtle, but key difference between who-I-am and
+what-I-have security.  For who-I-am, you must know who the caller is
+and check that the arguments are valid for the person calling.  For
+what-I-have security, you only have to validate the arguments.
 
-There are also some features that might be desirable, but are not being
-addressed by this security model.
 
-* Communication in any direction between an unprotected interpreter and
-  a sandboxed interpreter it created.
-
-
-The Proposed Approach
+Object-Capabilities
 ///////////////////////////////////////
 
-In light of where 'rexec' succeeded and failed along with what is known
-about the two main approaches to security and how Python tends to
-operate, the following is a proposal on how to secure Python for
-sandboxing.
+What-I-have security is more often called the object-capabilities
+security model.  The belief here is in POLA (Principle Of Least
+Authority): you give a program exactly what it needs, and no more.  By
+providing a function that can open any file that relies on identity to
+decide if to open something, you are still providing a fully capable
+function that just requires faking one's identity to circumvent
+security.  It also means that if you accidentally run code that
+performs actions that you did not expect (e.g., deleting all your
+files), there is no way to stop it since it operates with *your*
+permissions.
 
+Using POLA and object-capabilities, you only give access to resources
+to the extent that someone needs.  This means if a program only needs
+access to a single file, you only give them a function that can open
+that single file.  If you accidentally run code that tries to delete
+all of your files, it can only delete the one file you authorized the
+program to open.
+
+Object-capabilities use the reference graph of objects to provide the
+security of accessing resources.  If you do not have a reference to a
+resource (or a reference to an object that can references a resource),
+you cannot access it, period.  You can provide conditional access by
+using a proxy between code and a resource, but that still requires a
+reference to the resource by the proxy.
+
+This leads to a much cleaner implementation of security.  By not
+having to change internal code in the interpreter to perform identity
+checks, you can instead shift the burden of security to proxies
+which are much more flexible and have less of an adverse affect on the
+interpreter directly (assuming you have the basic requirements for
+object-capabilities met).
+
+
+Difficulties in Python for Object-Capabilities
+//////////////////////////////////////////////
+
+In order to provide the proper protection of references that
+object-capabilities require, you must set up a secure perimeter
+defense around your security domain.  The domain can be anthing:
+objects, interpreters, processes, etc.  The point is that the domain
+is where you draw the line for allowing arbitrary access to resources.
+This means that with the interpreter is the security domain, then
+anything within an interpreter can be expected to be freely shared,
+but beyond that, reference access is strictly controlled.
+
+Three key requirements for providing a proper perimeter defence is
+private namespaces, immutable shared state across domains, and
+unforgeable references.  Unfortunately Python only has one of the
+three requirements by default (you cannot forge a reference in Python
+code).
 
-Implementation Details
+
+Problem of No Private Namespace
 ===============================
 
-Support for sandboxed interpreters will require a compilation flag.
-This allows the more common case of people not caring about protections
-to not take a performance hit.  And even when Python is compiled for
-sandboxed interpreter restrictions, when the running interpreter *is*
-unprotected, there will be no accidental triggers of protections.  This
-means that developers should be liberal with the security protections
-without worrying about there being issues for interpreters that do not
-need/want the protection.
-
-At the Python level, the __sandboxed__ built-in will be set based on
-whether the interpreter is sandboxed or not.  This will be set for
-*all* interpreters, regardless of whether sandboxed interpreter support
-was compiled in or not.
-
-For setting what is to be protected, the PyThreadState for the
-sandboxed interpreter must be passed in.  This makes the protection
-very explicit and helps make sure you set protections for the exact
-interpreter you mean to.  All functions that set protections begin with
-the prefix ``PySandbox_Set*()``.  These functions are meant to only
-work with sandboxed interpreters that have not been used yet to execute
-any Python code.  The calls must be made by the code creating and
-handling the sandboxed interpreter *before* the sandboxed interpreter
-is used to execute any Python code.
-
-The functions for checking for permissions are actually macros that
-take in at least an error return value for the function calling the
-macro.  This allows the macro to return on behalf of the caller if the
-check fails and cause the SandboxError exception to be propagated
-automatically.  This helps eliminate any coding errors from incorrectly
-checking a return value on a rights-checking function call.  For the
-rare case where this functionality is disliked, just make the check in
-a utility function and check that function's return value (but this is
-strongly discouraged!).
-
-Functions that check that an operation is allowed implicitly operate on
-the currently running interpreter as returned by
-``PyInterpreter_Get()`` and are to be used by any code (the
-interpreter, extension modules, etc.) that needs to check for
-permission to execute.  They have the common prefix of 
-`PySandbox_Allowed*()``.
-
-
-API
---------------
-
-* PyThreadState* PySandbox_NewInterpreter()
-    Return a new interpreter that is considered sandboxed.  There is no
-    corresponding ``PySandbox_EndInterpreter()`` as
-    ``Py_EndInterpreter()`` will be taught how to handle sandboxed
-    interpreters.  ``NULL`` is returned on error.
-
-* PySandbox_Allowed(error_return)
-    Macro that has the caller return with 'error_return' if the
-    interpreter is unprotected, otherwise do nothing.
-
-
-Memory
-=============================
-
-Protection
---------------
-
-A memory cap will be allowed.
-
-Modification to pymalloc will be needed to properly keep track of the
-allocation and freeing of memory.  Same goes for the macros around the
-system malloc/free system calls.  This provides a platform-independent
-system for protection of memory instead of relying on the operating
-system to provide a service for capping memory usage of a process.  It
-also allows the protection to be at the interpreter level instead of at
-the process level.
-
-Existing APIs to protect:
-- _PyObject_New()
-    protected directly
-- _PyObject_NewVar()
-    protected directly
-- _PyObject_Del()
-    remove macro that uses PyObject_Free() and protect directly
-- PyObject_New()
-    implicitly by macro using _PyObject_New()
-- PyObject_NewVar()
-    implicitly by macro using _PyObject_NewVar()
-- PyObject_Del()
-    redefine macro to use _PyObject_Del() instead of PyObject_Free()
-- PyMem_Malloc()
-    protected directly
-- PyMem_Realloc()
-    protected directly
-- PyMem_Free()
-    protected directly
-- PyMem_New()
-    implicitly protected by macro using PyMem_Malloc()
-- PyMem_Resize()
-    implicitly protected by macro using PyMem_Realloc()
-- PyMem_Del()
-    implicitly protected by macro using PyMem_Free()
-- PyMem_MALLOC()
-    redefine macro to use PyMem_Malloc()
-- PyMem_REALLOC()
-    redefine macro to use PyMem_Realloc()
-- PyMem_FREE()
-    redefine macro to use PyMem_Free()
-- PyMem_NEW()
-    implicitly protected by macro using PyMem_MALLOC()
-- PyMem_RESIZE()
-    implicitly protected by macro using PyMem_REALLOC()
-- PyMem_DEL()
-    implicitly protected by macro using PyMem_FREE()
-- PyObject_Malloc()
-    XXX
-- PyObject_Realloc()
-    XXX
-- PyObject_Free()
-    XXX
-- PyObject_MALLOC()
-    XXX
-- PyObject_REALLOC()
-    XXX
-- PyObject_FREE()
-    XXX
-
-
-Why
---------------
-
-Protecting excessive memory usage allows one to make sure that a DoS
-attack against the system's memory is prevented.
-
-
-Possible Security Flaws
------------------------
-
-If code makes direct calls to malloc/free instead of using the proper
-``PyMem_*()``
-macros then the security check will be circumvented.  But C code is
-*supposed* to use the proper macros or pymalloc and thus this issue is
-not with the security model but with code not following Python coding
-standards.
-
-
-API
---------------
-
-* int PySandbox_SetMemoryCap(PyThreadState *, integer)
-    Set the memory cap for an sandboxed interpreter.  If the
-    interpreter is not running an sandboxed interpreter, return a false
-    value.
-
-* PySandbox_AllowedMemoryAlloc(integer, error_return)
-    Macro to increase the amount of memory that is reported that the
-    running sandboxed interpreter is using.  If the increase puts the
-    total count passed the set limit or leads to integer overflow in
-    the allocation count, raise an SandboxError exception
-    and cause the calling function to return with the value of
-    'error_return', otherwise do nothing.
-
-* void PySandbox_AllowedMemoryFree(integer)
-    Decrease the current running interpreter's allocated
-    memory.  If this puts the memory used to below 0, re-set it to 0.
+Typically, in languages that are statically typed (like C++), you have
+public and private attributes on objects.  Those private attributes
+provide a private namespace for the class and instances that are not
+accessible by other objects.
+
+The Python language has no such thing as a private namespace.  The
+language has the philosophy that if exposing something to the
+programmer could provide some use, then it is exposed.  This has led
+to Python having a wonderful amount of introspection abilities.
+Unfortunately this makes the possibility of a private namespace
+non-existent.  This poses an issue for providing proxies for resources
+since there is no way in Python code to hide the reference to a
+resource.
+
+Luckily, the Python virtual machine *does* provide a private namespace,
+albeit not for pure Python source code.  If you use the Python/C
+language barrier in extension modules, you can provide a private
+namespace by using the struct allocated for each instance of an
+object.  This provides a way to create proxies, written in C, that can
+protect resources properly.  Throughout this document, when mentioning
+proxies, it is assumed they have been implemented in C.
 
 
-Reading/Writing Files
-=============================
+Problem of Mutable Shared State
+===============================
 
-Protection
---------------
+Another problem that Python's introspection abilties cause is that of
+mutable shared state.  At the interpreter level, there has never been
+a concerted effort to isolate state shared between all interpreters
+running in the same Python process.  Sometimes this is for performance
+reasons, sometimes because it is just easier to implement this way.
+Regardless, sharing of state that can be influenced by another
+interpreter is not safe for object-capabilities.
+
+To rectify the situation, some changes will be needed to some built-in
+objects in Python.  It should mostly consist of abstracting or
+refactoring certain abilities out to an extension module so that
+access can be protected using import guards.
 
-XXX
 
-To open a file, one will have to use open().  This will make open() a
-factory function that controls reference access to the 'file' type in
-terms of creating new instances.  When an attempted file opening fails
-(either because the path does not exist or of security reasons),
-SandboxError will be raised.  The same exception must be raised to
-prevent filesystem information being gleaned from the type of exception
-returned (i.e., returning IOError if a path does not exist tells the
-user something about that file path).
-
-What open() returns may not be an instance of 'file' but a proxy that
-provides the security measures needed.  While this might break code
-that uses type checking to make sure a 'file' object is used, taking a
-duck typing approach would be better.  This is not only more Pythonic
-but would also allow the code to use a StringIO instance.
-
-It has been suggested to allow for a passed-in callback to be called
-when a specific path is to be opened.  While this provides good
-flexibility in terms of allowing custom proxies with more fine-grained
-security (e.g., capping the amount of disk write), this has been deemed
-unneeded in the initial security model and thus is not being considered
-at this time.  
-
-Why
---------------
-
-Allowing anyone to be able to arbitrarily read, write, or learn about
-the layout of your filesystem is extremely dangerous.  It can lead to
-loss of data or data being exposed to people whom should not have
-access.
+Threat Model
+///////////////////////////////////////
 
+The threat that this security model is attempting to handle is the
+execution of arbitrary Python code in a sandboxed interpreter such
+that the code in that interpreter is not able to harm anything outside
+of itself.  This means that:
+
+* An interpreter cannot influence another interpreter directly at the
+  Python level without explicitly allowing it.
+    + This includes preventing communicating with another interpreter.
+    + Mutable objects cannot be shared between interpreters without
+      explicit allowance for it.
+    + "Explicit allowance" includes the importation of C extension
+      modules because a technical detail requires that these modules
+      not be re-initialized per interpreter, meaning that all
+      interpreters in a single Python process share the same C
+      extension modules.
+* An interpreter cannot use operating system resources without being
+  explicitly given those resources.
+    + This includes importing modules since that requires the ability
+      to use the resource of the filesystem.
+
+In order to accomplish these goals, certain things must be made true.
+
+* The Python process is the "powerbox".
+    + It controls the initial granting of abilties to interpreters.
+* A bare Python interpreter is always trusted.
+    + Python source code that can be created in a bare interpreter is
+      always trusted.
+    + Python source code created within a bare interpreter cannot
+      crash the interpreter.
+* Python bytecode is always distrusted.
+    + Malicious bytecode can bring down an interpreter.
+* Pure Python source code is always safe on its own.
+    + Malicious abilities are derived from C extension modules,
+      built-in modules, and unsafe types implemented in C, not from
+      pure Python source.
+* A sub-interpreter started by another interpreter does not inherit
+  any state.
+    + The sub-interpreter starts out with a fresh global namespace and
+      whatever built-ins it was initially given.
 
-Possible Security Flaws
------------------------
-
-XXX
 
+Implementation
+///////////////////////////////////////
 
-API
---------------
+Guiding Principles
+========================
 
-* int PySandbox_SetAllowedFile(PyThreadState *, string path,
-                                string mode)
-    Add a file that is allowed to be opened in 'mode' by the 'file'
-    object.  If the interpreter is not sandboxed then return a false
-    value.
-
-* PySandbox_AllowedPath(string path, string mode, error_return)
-    Macro that causes the caller to return with 'error_return' and
-    raise SandboxError as the exception if the specified path with
-    'mode' is not allowed, otherwise do nothing.
-
-
-Extension Module Importation
-============================
-
-Protection
---------------
-
-A whitelist of extension modules that may be imported must be provided.
-A default set is given for stdlib modules known to be safe.
-
-A check in the import machinery will check that a specified module name
-is allowed based on the type of module (Python source, Python bytecode,
-or extension module).  Python bytecode files are never directly
-imported because of the possibility of hostile bytecode being present.
-Python source is always considered safe based on the assumption that
-all resource harm is eventually done at the C level, thus Python source
-code directly cannot cause harm without help of C extension modules.
-Thus only C extension modules need to be checked against the whitelist.
-
-The requested extension module name is checked in order to make sure
-that it is on the whitelist if it is a C extension module.  If the name
-is not correct a SandboxError exception is raised.  Otherwise the
-import is allowed.  
-
-Even if a Python source code module imports a C extension module in an
-unprotected interpreter it is not a problem since the Python source
-code module is reloaded in the sandboxed interpreter.  When that Python
-source module is freshly imported the normal import check will be
-triggered to prevent the C extension module from becoming available to
-the sandboxed interpreter.
-
-For the 'os' module, a special sandboxed version will be used if the
-proper C extension module providing the correct abilities is not
-allowed.  This will default to '/' as the path separator and provide as
-much reasonable abilities as possible from a pure Python module.
+To begin, the Python process garners all power as the powerbox.  It is
+up to the process to initially hand out access to resources and
+abilities to interpreters.  This might take the form of an interpreter
+with all abilities granted (i.e., a standard interpreter as launched
+when you execute Python), which then creates sub-interpreters with
+sandboxed abilities.  Another alternative is only creating
+interpreters with sandboxed abilities (i.e., Python being embedded in
+an application that only uses sandboxed interpreters).
+
+All security measures should never have to ask who an interpreter is.
+This means that what abilities an interpreter has should not be stored
+at the interpreter level when the security can use a proxy to protect
+a resource.  This means that while supporting a memory cap can
+have a per-interpreter setting that is checked (because access to the
+operating system's memory allocator is not supported at the program
+level), protecting files and imports should not such a per-interpreter
+protection at such a low level (because those can have extension
+module proxies to provide the security).
+
+For common case security measures, the Python standard library
+(stdlib) should provide a simple way to provide those measures.  Most
+commonly this will take the form of providing factory functions that
+create instances of proxies for providing protection of key resources.
+
+Backwards-compatibility will not be a hindrance upon the design or
+implementation of the security model.  Because the security model will
+inherently remove resources and abilities that existing code expects,
+it is not reasonable to expect existing code to work in a sandboxed
+interpreter.
+
+Keeping Python "pythonic" is required for all design decisions.  If
+removing an ability leads to something being unpythonic, it will not
+be done.  This does not mean existing pythonic code must continue to
+work, but the spirit of being pythonic will not be compromised in the
+name of the security model.  While this might lead to a weaker
+security model, this is a price that must be paid in order for Python
+to continue to be the language that it is.
+
+Restricting what is in the built-in namespace and the safe-guarding
+the interpreter (which includes safe-guarding the built-in types) is
+where security will come from.  Imports and the ``file`` type are
+both part of the standard namespace and must be restricted in order
+for any security implementation to be effective.
+The built-in types which are needed for basic Python usage (e.g.,
+``object`` code objects, etc.) must be made safe to use in a sandboxed
+interpreter since they are easily accessbile and yet required for
+Python to function.
+
+
+Abilities of a Standard Sandboxed Interpreter
+=============================================
+
+In the end, a standard sandboxed interpreter should (not)
+allow certain things to be doable by code running within itself.
+Below is a list of abilities that will (not) be allowed in the default
+instance of a sandboxed interpreter comparative to an unprotected
+interpreter that has not imported any modules.  These protections can
+be tweaked by using proxies to allow for certain extended abilities to
+be accessible.
+
+* You cannot open any files directly.
+* Importation
+    + You can import any pure Python module.
+    + You cannot import any Python bytecode module.
+    + You cannot import any C extension module.
+    + You cannot import any built-in module.
+* You cannot find out any information about the operating system you
+  are running on.
+* Only safe built-ins are provided.
 
-The 'sys' module is specially addressed in
-`Changing the Behaviour of the Interpreter`_.
 
-By default, the whitelisted modules are:
+Implementation Details
+========================
 
+An important point to keep in mind when reading about the
+implementation details for the security model is that these are
+general changes and are not special to any type of interpreter,
+sandboxed or otherwise.  That means if a change to a built-in type is
+suggested and it does not involve a proxy, that change is meant
+Python-wide for *all* interpreters.
+
+
+Imports
+-------
+
+A proxy for protecting imports will be provided.  This is done by
+setting the ``__import__()`` function in the built-in namespace of the
+sandboxed interpreter to a proxied version of the function.
+
+The planned proxy will take in a passed-in function to use for the
+import and a whitelist of C extension modules and built-in modules to
+allow importation of.  If an import would lead to loading an extension
+or built-in module, it is checked against the whitelist and allowed
+to be imported based on that list.  All .pyc and .pyo file will not
+be imported.  All .py files will be imported.
+
+XXX perhaps augment 'sys' so that you list the extension of files that
+can be used for importing?  Thought this was controlled somewhere
+already but can't find it.
+
+It must be warned that importing any C extension module is dangerous.
+Not only are they able to circumvent security measures by executing C
+code, but they share state across interpreters.  Because an extension
+module's init function is only called once for the Python *process*,
+its initial state is set only once.  This means that if some mutable
+object is exposed at the module level, a sandboxed interpreter could
+mutate that object, return, and then if the creating interpreter
+accesses that mutated object it is essentially communicating and/or
+acting on behalf of the sandboxed interpreter.  This violates the
+perimeter defence.  No one should import extension modules blindly.
+
+
+Sanitizing Built-In Types
+-------------------------
+
+Python contains a wealth of bulit-in types.  These are used at a basic
+level so that they are easily accessible to any Python code.  They are
+also shared amongst all interpreters in a Python process.  This means
+all built-in types need to be made safe (e.g., immutable shared
+state) so that they can be used by any and all interpreters in a
+single Python process.  Several aspects of built-in types need to be
+examined.
+
+
+Constructors
+++++++++++++
+
+Almost all of Python's built-in types
+contain a constructor that allows code to create a new instance of a
+type as long as you have the type itself.  Unfortunately this does not
+work in an object-capabilities system without either providing a proxy
+to the constructor or just turning it off.
+
+The plan is to turn off the constructors that are currently supplied
+directly by the types that are dangerous.  Their constructors will
+then either be shifted over to factory functions that will be stored
+in a C extension module or to built-ins  that will be
+provided to use to create instances.  The former approach will allow
+for protections to be enforced by import proxy; just don't allow the
+extension module to be imported.  The latter approach would allow
+either a unique constructor per type, or more generic built-in(s) for
+construction (e.g., introducing a ``construct()`` function that takes
+in a type and any arguments desired to be passed in for constructing
+an instance of the type) and allowing using proxies to provide
+security.
+
+Some might consider this unpythonic.  Python very rarely separates the
+constructor of an object from the class/type and require that you go
+through a function.  But there is some precedent for not using a
+type's constructor to get an instance of a type.  The ``file`` type,
+for instance, typically has its instances created through the
+``open()`` function.  This slight shift for certain types to have their
+(dangerous) constructor not on the type but in a function is
+considered an acceptable compromise.
+
+Types whose constructors are considered dangerous are:
+
+* ``file``
+    + Will definitely use the ``open()`` built-in.
+* code objects
+* XXX sockets?
+* XXX type?
 * XXX
 
 
-Why
---------------
-
-Because C code is considered unsafe, its use should be regulated.  By
-using a whitelist it allows one to explicitly decide that a C extension
-module is considered safe.  
-
-
-Possible Security Flaws
------------------------
-
-If a whitelisted C extension module imports a non-whitelisted C
-extension module and makes it an attribute of the whitelisted module
-there will be a breach in security.  Luckily this a rarity in
-extension modules.  
-
-There is also the issue of a C extension module calling the C API of a
-non-whitelisted C extension module.
-
-Lastly, if a whitelisted C extension module is loaded in an unprotected
-interpreter and then loaded into a sandboxed interpreter then there is
-no checks during module initialization for possible security issues in
-the sandboxed interpreter that would have occurred had the sandboxed
-interpreter done the initial import.
-
-All of these issues can be handled by never blindly whitelisting a C
-extension module.  Added support for dealing with C extension modules
-comes in the form of `Extension Module Crippling`_.  
-
-
-API
---------------
-
-* int PySandbox_SetModule(PyThreadState *, string module_name)
-    Allow the sandboxed interpreter to import 'module_name'.  If the
-    interpreter is not sandboxed, return a false value.  Absolute
-    import paths must be specified.
-
-* int PySandbox_BlockModule(PyThreadState *, string module_name)
-    Remove the specified module from the whitelist.  Used to remove
-    modules that are allowed by default.  Return a false value if
-    called on an unprotected interpreter.
-
-* PySandbox_AllowedModule(string module_name, error_return)
-    Macro that causes the caller to return with 'error_return' and sets
-    the exception SandboxError if the specified module cannot be
-    imported, otherwise does nothing.
-
-
-Extension Module Crippling
-==========================
-
-Protection
---------------
-
-By providing a C API for checking for allowed abilities, modules that
-have some useful functionality can do proper security checks for those
-functions that could provide insecure abilities while allowing safe
-code to be used (and thus not fully deny importation).
-
-
-Why
---------------
-
-Consider a module that provides a string processing ability.  If that
-module provides a single convenience function that reads its input
-string from a file (with a specified path), the whole module should not
-be blocked from being used, just that convenience function.  By
-whitelisting the module but having a security check on the one problem
-function, the user can still gain access to the safe functions.  Even
-better, the unsafe function can be allowed if the security checks pass.
-
-
-Possible Security Flaws
------------------------
-
-If a C extension module developer incorrectly implements the security
-checks for the unsafe functions it could lead to undesired abilities.
-
-
-API
---------------
-
-Use PySandbox_Allowed() to protect unsafe code from being executed.
-
-
-Hostile Bytecode
-=============================
-
-Protection
---------------
-
-XXX
-
-
-Why
---------------
-
-Without implementing a bytecode verification tool, there is no way of
-making sure that bytecode does not jump outside its bounds, thus
-possibly executing malicious code.  It also presents the possibility of
-crashing the interpreter.
-
-
-Possible Security Flaws
------------------------
-
-None known.
-
-
-API
---------------
-
-N/A
-
-
-Changing the Behaviour of the Interpreter
-=========================================
-
-Protection
---------------
-
-Only a subset of the 'sys' module will be made available to sandboxed
-interpreters.  Things to allow from the sys module:
-
-* byteorder (?)
-* copyright 
-* displayhook
-* excepthook
-* __displayhook__
-* __excepthook__
-* exc_info
-* exc_clear
-* exit
-* getdefaultencoding
-* _getframe (?)
-* hexversion
-* last_type
-* last_value
-* last_traceback
-* maxint (?)
-* maxunicode (?)
-* modules
-* stdin  # See `Stdin, Stdout, and Stderr`_.
-* stdout
-* stderr
-* version
-
-
-Why
---------------
-
-Filesystem information must be removed.  Any settings that could
-possibly lead to a DoS attack (e.g., sys.setrecursionlimit()) or risk
-crashing the interpreter must also be removed.
-
-
-Possible Security Flaws
------------------------
-
-Exposing something that could lead to future security problems (e.g., a
-way to crash the interpreter).
-
-
-API
---------------
-
-None.
-
-
-Socket Usage
-=============================
-
-Protection
---------------
-
-Allow sending and receiving data to/from specific IP addresses on
-specific ports.
-
-open() is to be used as a factory function to open a network
-connection.  If the connection is not possible (either because of an
-invalid address or security reasons), SandboxError is raised.
-
-A socket object may not be returned by the call.  A proxy to handle
-security might be returned instead.
-
-XXX
-
-
-Why
---------------
-
-Allowing arbitrary sending of data over sockets can lead to DoS attacks
-on the network and other machines.  Limiting accepting data prevents
-your machine from being attacked by accepting malicious network
-connections.  It also allows you to know exactly where communication is
-going to and coming from.
-
-
-Possible Security Flaws
------------------------
-
-If someone managed to influence the used DNS server to influence what
-IP addresses were used after a DNS lookup.
-
-
-API
---------------
-
-* int PySandbox_SetIPAddress(PyThreadState *, string IP, integer port)
-    Allow the sandboxed interpreter to send/receive to the specified
-    'IP' address on the specified 'port'.  If the interpreter is not
-    sandboxed, return a false value.
-
-* PySandbox_AllowedIPAddress(string IP, integer port, error_return)
-    Macro to verify that the specified 'IP' address on the specified
-    'port' is allowed to be communicated with.  If not, cause the
-    caller to return with 'error_return' and SandboxError exception
-    set, otherwise do nothing.
-
-* int PySandbox_SetHost(PyThreadState *, string host, integer port)
-    Allow the sandboxed interpreter to send/receive to the specified
-    'host' on the specified 'port'.  If the interpreter is not
-    sandboxed, return a false value.
-
-* PySandbox_AllowedHost(string host, integer port, error_return)
-    Check that the specified 'host' on the specified 'port' is allowed
-    to be communicated with.  If not, set a SandboxError exception and
-    cause the caller to return 'error_return', otherwise do nothing.
-
-
-Network Information
-=============================
-
-Protection
---------------
-
-Limit what information can be gleaned about the network the system is
-running on.  This does not include restricting information on IP
-addresses and hosts that are have been explicitly allowed for the
-sandboxed interpreter to communicate with.
-
-XXX
-
-
-Why
---------------
-
-With enough information from the network several things could occur.
-One is that someone could possibly figure out where your machine is on
-the Internet.  Another is that enough information about the network you
-are connected to could be used against it in an attack.
-
-
-Possible Security Flaws
------------------------
-
-As long as usage is restricted to only what is needed to work with
-allowed addresses, there are no security issues to speak of.
-
-
-API
---------------
-
-* int PySandbox_SetNetworkInfo(PyThreadState *)
-    Allow the sandboxed interpreter to get network information
-    regardless of whether the IP or host address is explicitly allowed.
-    If the interpreter is not sandboxed, return a false value.
-
-* PySandbox_AllowedNetworkInfo(error_return)
-    Macro that will return 'error_return' for the caller and set a
-    SandboxError exception if the sandboxed interpreter does not allow
-    checking for arbitrary network information, otherwise do nothing.
-
-
 Filesystem Information
-=============================
-
-Protection
---------------
+++++++++++++++++++++++
 
-Do not allow information about the filesystem layout from various parts
-of Python to be exposed.  This means blocking exposure at the Python
-level to:
-
-* __file__ attribute on modules
-* __path__ attribute on packages
-* co_filename attribute on code objects
+When running code in a sandboxed interpreter, POLA suggests that you
+do not want to expose information about your environment on top of
+protecting its use.  This means that filesystem paths typically should
+not be exposed.  Unfortunately, Python exposes file paths all over the
+place:
+
+* Modules
+    + ``__file__`` attribute
+* Code objects
+    + ``co_filename`` attribute
+* Packages
+    + ``__path__`` attribute
 * XXX
 
+XXX how to expose safely?
 
-Why
---------------
-
-Exposing information about the filesystem is not allowed.  You can
-figure out what operating system one is on which can lead to
-vulnerabilities specific to that operating system being exploited.
 
+Mutable Shared State
+++++++++++++++++++++
 
-Possible Security Flaws
------------------------
-
-Not finding every single place where a file path is exposed.
-
-
-API
---------------
-
-* int PySandbox_SetFilesystemInfo(PyThreadState *)
-    Allow the sandboxed interpreter to expose filesystem information.
-    If the passed-in interpreter is not sandboxed, return NULL.
-
-* PySandbox_AllowedFilesystemInfo(error_return)
-    Macro that checks if exposing filesystem information is allowed.
-    If it is not, cause the caller to return with the value of
-    'error_return' and raise SandboxError, otherwise do nothing.
-
-
-Stdin, Stdout, and Stderr
-=============================
-
-Protection
---------------
-
-By default, sys.__stdin__, sys.__stdout__, and sys.__stderr__ will be
-set to instances of StringIO.  Explicit allowance of the process'
-stdin, stdout, and stderr is possible.
-
-This will protect the 'print' statement, and the built-ins input() and
-raw_input().
-
-
-Why
---------------
-
-Interference with stdin, stdout, or stderr should not be allowed unless
-desired.  No one wants uncontrolled output sent to their screen.
-
-
-Possible Security Flaws
------------------------
-
-Unless StringIO instances can be used maliciously, none to speak of.
-
-
-API
---------------
-
-* int PySandbox_SetTrueStdin(PyThreadState *)
-  int PySandbox_SetTrueStdout(PyThreadState *)
-  int PySandbox_SetTrueStderr(PyThreadState *)
-    Set the specific stream for the interpreter to the true version of
-    the stream and not to the default instance of StringIO.  If the
-    interpreter is not sandboxed, return a false value.
-
-
-Adding New Protections
-=============================
-
-.. note:: This feature has the lowest priority and thus will be the
-          last feature implemented (if ever).
-
-Protection
---------------
-
-Allow for extensibility in the security model by being able to add new
-types of checks.  This allows not only for Python to add new security
-protections in a backwards-compatible fashion, but to also have
-extension modules add their own as well.
-
-An extension module can introduce a group for its various values to
-check, with a type being a specific value within a group.  The "Python"
-group is specifically reserved for use by the Python core itself.
-
-
-Why
---------------
-
-We are all human.  There is the possibility that a need for a new type
-of protection for the interpreter will present itself and thus need
-support.  By providing an extensible way to add new protections it
-helps to future-proof the system.
-
-It also allows extension modules to present their own set of security
-protections.  That way one extension module can use the protection
-scheme presented by another that it is dependent upon.
-
-
-Possible Security Flaws
-------------------------
-
-Poor definitions by extension module users of how their protections
-should be used would allow for possible exploitation.
-
-
-API
---------------
-
-+ Bool
-    * int PySandbox_SetExtendedFlag(PyThreadState *, string group,
-                                    string type)
-        Set a group-type to be true.  Expected use is for when a binary
-        possibility of something is needed and that the default is to
-        not allow use of the resource (e.g., network information).
-        Returns a false value if used on an unprotected interpreter.
-
-    * PySandbox_AllowedExtendedFlag(string group, string type,
-                                    error_return)
-        Macro that if the group-type is not set to true, cause the
-        caller to return with 'error_return' with SandboxError
-        exception raised.  For unprotected interpreters the check does
-        nothing.
-
-+ Numeric Range
-    * int PySandbox_SetExtendedCap(PyThreadState *, string group,
-                                    string type, integer cap)
-        Set a group-type to a capped value, 'cap', with the initial
-        allocated value set to 0.  Expected use is when a resource has
-        a capped amount of use (e.g., memory).  Returns a false value
-        if the interpreter is not sandboxed.
-
-    * PySandbox_AllowedExtendedAlloc(integer increase, error_return)
-        Macro to raise the amount of a resource is used by 'increase'.
-        If the increase pushes the resource allocation past the set
-        cap, then return 'error_return' and set SandboxError as the
-        exception, otherwise do nothing.
-
-    * PySandbox_AllowedExtendedFree(integer decrease, error_return)
-        Macro to lower the amount a resource is used by 'decrease'.  If
-        the decrease pushes the allotment to below 0 then have the
-        caller return 'error_return' and set SandboxError as the
-        exception, otherwise do nothing.
-
-
-+ Membership
-    * int PySandbox_SetExtendedMembership(PyThreadState *,
-                                            string group, string type,
-                                            string member)
-        Add a string, 'member',  to be considered a member of a
-        group-type (e.g., allowed file paths).  If the interpreter is not
-        an sandboxed interpreter, return a false value.
-
-    * PySandbox_AllowedExtendedMembership(string group, string type,
-                                            string member,
-                                            error_return)
-        Macro that checks 'member' is a member of the values set for
-        the group-type.  If it is not, then have the caller return
-        'error_return' and set an exception for SandboxError, otherwise
-        does nothing.
-
-+ Specific Value
-    * int PySandbox_SetExtendedValue(PyThreadState *, string group,
-                                        string type, string value)
-        Set a group-type to 'value'.  If the interpreter is not
-        sandboxed, return NULL.
-
-    * PySandbox_AllowedExtendedValue(string group, string type,
-                                        string value, error_return)
-        Macro to check that the group-type is set to 'value'.  If it is
-        not, then have the caller return 'error_return' and set an
-        exception for SandboxError, otherwise do nothing.
+Because built-in types are shared between interpreters, they cannot
+expose any mutable shared state.  Unfortunately, as it stands, some
+do.  Below is a list of types that share some form of dangerous state,
+how they share it, and how to fix the problem:
 
+* ``object``
+    + ``__subclasses__()`` function
+        - Remove the function; never seen used in real-world code.
+* XXX
 
-Python API
-=============================
 
-__sandboxed__
---------------
+Perimeter Defences Between a Created Interpreter and Its Creator
+----------------------------------------------------------------
 
-A built-in that flags whether the interpreter currently running is
-sandboxed or not.  Set to a 'bool' value that is read-only.  To mimic
-working of __debug__.
+The plan is to allow interpreters to instantiate sandboxed
+interpreters safely.  By using the creating interpreter's abilities to
+provide abilities to the created interpreter, you make sure there is
+no escalation in abilities.
+
+But by creating a sandboxed interpreter and passing in any code into
+it, you open up the chance of possible ways of getting back to the
+creating interpreter or escalating privileges.  Those ways are:
+
+* ``__del__`` created in sandboxed interpreter but object is cleaned
+  up in unprotected interpreter.
+* Using frames to walk the frame stack back to another interpreter.
+* XXX
 
 
-sandbox module
---------------
+Making the ``sys`` Module Safe
+------------------------------
 
 XXX
 
 
-References
-///////////////////////////////////////
-
-.. [#rexec] The 'rexec' module
-   (http://docs.python.org/lib/module-rexec.html)
-
-.. [#safe-tcl] The Safe-Tcl Security Model
-   (http://research.sun.com/technical-reports/1997/abstract-60.html)
+Safe Networking
+---------------
 
-.. [#ctypes] 'ctypes' module
-   (http://docs.python.org/dev/lib/module-ctypes.html)
-
-.. [#paradigm regained] "Paradigm Regained:
-                         Abstraction Mechanisms for Access Control"
-   (http://erights.org/talks/asian03/paradigm-revised.pdf)
-
-.. [#armin-hiding] [Python-Dev] what can we do to hide the 'file' type?
-   (http://mail.python.org/pipermail/python-dev/2006-July/067076.html)
+XXX


More information about the Python-checkins mailing list