[Python-Dev] second draft of sandboxing design doc

Sat Jul 8 00:51:11 CEST 2006

OK, lots of revisions.  The approach to handling 'file' I have left up in
the air.  Biggest change is switching to "unprotected" and "sandboxed" for
terms when referring to the interpreter.  Also added a Threat Model section
to explain assumptions about the basics of the interpreter.  Hopefully it is
also more clear about the competing approaches for dealing with 'file'.

I am planning on starting work next week on implementation, but I will start
with the least controversial and work my way up.

One thing that is open that I would like some feedback on immediately is
whether people would rather pass in PyObjects or C level types to the
API.  The latter makes implementing the Python wrapper much easier, but
makes embedding a more heavy-handed.  People have a preference or think that
the Python API will be used more often than the C one?  I am leaning towards
making the C API simpler by using C level types (const char *, etc.) and
just deal with the Python wrappings requiring more rejiggering between
types.

Once again, I have a branch going (bcannon-sandboxing) where the work is
going to be done and where this doc lives.  I don't plan on doing another
post of this doc until another major revision.

---------------------------------------------------------------------------------------------------------------

      Restricted Execution for Python
#######################################

About This Document
=============================

This document is meant to lay out the general design for re-introducing
a sandboxing model for Python.  This document should provide one with
enough information to understand the goals for sandboxing, what
considerations were made for the design, and the actual design itself.
Design decisions should be clear and explain not only why they were
chosen but possible drawbacks from taking a specific approach.

If any of the above is found not to be true, please email me at
brett at python.org and let me know what problems you are having with the
document.

XXX TO DO
=============================

* threading needs protection?
* python-dev convince me that hiding 'file' possible?
    + based on that, handle code objects
    + also decide how to handle sockets
    + perhaps go with crippling but try best effort on hiding reference and
if
      best effort holds up eventually shift over to capabilities system
* resolve to IP at call time to prevent DNS man-in-the-middle attacks when
  allowing a specific host name?
* what network info functions are allowed by default?
* does the object.__subclasses__() trick work across interpreters, or is it
  unique per interpreter?
* figure out default whitelist of extension modules
* check default accessible objects for file path exposure
* helper functions to get at StringIO instances for stdin, stdout, and
friends?
* decide on what type of objects (e.g., PyStringObject or const char *) are
to
  be passed in
* all built-ins properly protected?
* exactly how to tell whether argument to open() is a path, IP, or host name
  (third argument, 'n' prefix for networking, format of path, ...)
* API at the Python level
* for extension module protection, allow for wildcard allowance
  (e.g., ``xml.*``)

Goal
=============================

A good sandboxing model provides enough protection to prevent malicious
harm to come to the system, and no more.  Barriers should be minimized
so as to allow most code that does not do anything that would be
regarded as harmful to run unmodified.  But the protections need to be
thorough enough to prevent any unintended changes or information of the
system to come about.

An important point to take into consideration when reading this
document is to realize it is part of my (Brett Cannon's) Ph.D.
dissertation.  This means it is heavily geared toward sandboxing when
the interpreter is working with Python code embedded in a web page as
viewed in Firefox.  While great strides have been taken to keep the
design general enough so as to allow all previous uses of the 'rexec'
module [#rexec]_ to be able to use the new design, it is not the
focused goal.  This means if a design decision must be made for the
embedded use case compared to sandboxing Python code in a pure Python
application, the former will win out over the latter.

Throughout this document, the term "resource" is used to represent
anything that deserves possible protection.  This includes things that
have a physical representation (e.g., memory) to things that are more
abstract and specific to the interpreter (e.g., sys.path).

When referring to the state of an interpreter, it is either
"unprotected" or "sandboxed".  A unprotected interpreter has no
restrictions imposed upon any resource.  A sandboxed interpreter has at
least one, possibly more, resource with restrictions placed upon it to
prevent unsafe code  that is running within the interpreter to cause
harm to the system.

.. contents::

Use Cases
/////////////////////////////

All use cases are based on how many sandboxed interpreters are running
in a single process and whether an unprotected interpreter is also
running.  The use cases can be broken down into two categories: when
the interpreter is embedded and only using sandboxed interpreters, and
when pure Python code is running in an unprotected interpreter and uses
sandboxed interpreters.

When the Interpreter Is Embedded
================================

Single Sandboxed Interpreter
----------------------------

This use case is when an application embeds the interpreter and never
has more than one interpreter running which happens to be sandboxed.

Multiple Sandboxed Interpreters
-------------------------------

When multiple interpreters, all sandboxed at varying levels, need to be
running within a single application.  This is the key use case that
this proposed design is targeted for.

Stand-Alone Python
=============================

When someone has written a Python program that wants to execute Python
code in an sandboxed interpreter(s).  This is the use case that 'rexec'
attempted to fulfill.

Issues to Consider
=============================

Common to all use cases, resources that the interpreter requires to
function at a level below user code cannot be exposed to a sandboxed
interpreter.  For instance, the interpreter might need to stat a file
to see if it is possible to import.  If the ability to stat a file is
not allowed to a sandboxed interpreter, it should not be allowed to
perform that action, regardless of whether the interpreter at a level
below user code needs that ability.

When multiple interpreters are involved (sandboxed or not), not
allowing an interpreter to gain access to resources available in other
interpreters without explicit permission must be enforced.

Resources to Protect
/////////////////////////////

It is important to make sure that the proper resources are protected
from a sandboxed interpreter.  If you don't there is no point to sandboxing.

Filesystem
===================

All facets of the filesystem must be protected.  This means restricting
reading and writing to the filesystem (e.g., files, directories, etc.).
It should be allowed in controlled situations where allowing access to
the filesystem is desirable, but that should be an explicit allowance.

There must also be protection to prevent revealing any information
about the filesystem.  Disclosing information on the filesystem could
allow one to infer what OS the interpreter is running on, for instance.

Memory
===================

Memory should be protected.  It is a limited resource on the system
that can have an impact on other running programs if it is exhausted.
Being able to restrict the use of memory would help alleviate issues
from denial-of-service (DoS) attacks on the system.

Networking
===================

Networking is somewhat like the filesystem in terms of wanting similar
protections.  You do not want to let unsafe code make socket
connections unhindered or accept them to do possibly nefarious things.
You also want to prevent finding out information about the network your
are connected to.

Interpreter
===================

One must make sure that the interpreter is not harmed in any way from
sandboxed code.  This usually takes the form of crashing the program
that the interpreter is embedded in or the unprotected interpreter that
started the sandbox interpreter.  Executing hostile bytecode that might
lead to undesirable effects is another possible issue.

There is also the issue of taking it over.  One should not able to gain
escalated privileges in any way without explicit permission.

Types of Security
///////////////////////////////////////

As with most things, there are multiple approaches one can take to
tackle a problem.  Security is no exception.  In general there seem to
be two approaches to protecting resources.

Resource Hiding
=============================

By never giving code a chance to access a resource, you prevent it from
being (ab)used.  This is the idea behind resource hiding; you can't
misuse something you don't have in the first place.

The most common implementation of resource hiding is capabilities.  In
this type of system a resource's reference acts as a ticket that
represents the right to use the resource.  Once code has a reference it
is considered to have full use of resource that reference represents
and no further security checks are directly performed (using delegates
and other structured ways one can actually have a security check for
each access of a resource, but this is not a default behaviour).

As an example, consider the 'file' type as a resource we want to
protect.  That would mean that we did not want a reference to the
'file' type to ever be accessible without explicit permission.  If one
wanted to provide read-only access to a temp file, you could have
open() perform a check on the permissions of the current interpreter,
and if it is allowed to, return a proxy object for the file that only
allows reading from it.  The 'file' instance for the proxy would need
to be properly hidden so that the reference was not reachable from
outside so that 'file' access could still be controlled.

Python, as it stands now, unfortunately does not work well for a pure
capabilities system.  Capabilities require the prohibition of certain
abilities, such as "direct access to another's private state"
[#paradigm regained]_.  This obviously is not possible in Python since,
at least at the Python level, there is no such thing as private state
that is persistent (one could argue that local variables that are not
cell variables for lexical scopes are private, but since they do not
survive after a function call they are not usable for keeping
persistent state).  One can hide references at the C level by storing
it in the struct for the instance of a type and not providing a
function to access that attribute.

Python's introspection abilities also do not help make implementing
capabilities that much easier.  Consider how one could access 'file'
even when it is deleted from __builtin__.  You can still get to the
reference for 'file' through the sequence returned by
``object.__subclasses__()``.

Resource Crippling
=============================

Another approach to security is to not worry about controlling access
to the reference of a resource.  One can have a resource perform a
security check every time someone tries to use a method on that
resource.  This pushes the security check to a lower level; from a
reference level to the method level.

By performing the security check every time a resource's method is
called the worry of a specific resource's reference leaking out to
insecure code is alleviated.  This does add extra overhead, though, by
having to do so many security checks.  It also does not handle the
situation where an unexpected exposure of a type occurs that has not
been properly crippled.

FreeBSD's jail system provides a protection scheme similar to this.
Various system calls allow for basic usage, but knowing or having
access to the system call is not enough to grant usage.  Every call to
a system call requires checking that the proper rights have been
granted to the use in order to allow for the system call to perform
its action.

An even better example in FreeBSD's jail system is its protection of
sockets.  One can only bind a single IP address to a jail.  Any attempt
to do more or perform uses with the one IP address that is granted is
prevented.  The check is performed at every call involving the one
granted IP address.

Using 'file' as the example again, one could cripple the type so that
instantiation is not possible for the type in Python.  One could also
provide a permission check on each call to a unsafe method call and
thus allow the type to be used in normal situations (such as type
checking), but still feel safe that illegal operations are not
performed.  Regardless of which approach you take, you do not need to
worry about a reference to the type being exposed unexpectedly since
the reference is not the security check but the actual method calls.

Comparison of the Two Approaches
================================

>From the perspective of Python, the two approaches differ on what would
be the most difficult thing to analyze from a security standpoint: all
of the ways to gain access to various types from a sandboxed
interpreter with no imports, or finding all of the types that can lead
to possibly dangerous actions and thus need to be crippled.

Some Python developers, such as Armin Rigo, feel that truly hiding
objects in Python is "quite hard" [#armin-hiding]_.  This sentiment
means that making a pure capabilities system in Python that is secure
is not possible as people would continue to find new ways to get a hold
of the reference to a protected resource.

Others feel that by not going the capabilities route we will be
constantly chasing down new types that require crippling.  The thinking
is that if we cannot control the references for 'file', how are we to
know what other types might become exposed later on and thus require
more crippling?

It essentially comes down to what is harder to do: find all the ways to
access the types in Python in a sandboxed interpreter with no imported
modules, or to go through the Python code base and find all types that
should be crippled?

The 'rexec' Module
///////////////////////////////////////

The 'rexec' module [#rexec]_ was the original attempt at providing a
sandbox environment for Python code to run in.  It's design was based
on Safe-Tcl which was essentially a capabilities system [#safe-tcl]_.
Safe-Tcl allowed you to launch a separate interpreter where its global
functions were specified at creation time.  This prevented one from
having any abilities that were not explicitly provided.

For 'rexec', the Safe-Tcl model was tweaked to better match Python's
situation.  An RExec object represented a sandboxed environment.
Imports were checked against a whitelist of modules.  You could also
restrict the type of modules to import based on whether they were
Python source, bytecode, or C extensions.  Built-ins were allowed
except for a blacklist of built-ins to not provide.  One could restrict
whether stdin, stdout, and stderr were provided or not on a per-RExec
basis.  Several other protections were provided; see documentation for
the complete list.

The ultimate undoing of the 'rexec' module was how access to objects
that in normal Python require no imports to reach was handled.
Importing modules requires a direct action, and thus can be protected
against directly in the import machinery.  But for built-ins, they are
accessible by default and require no direct action to access in normal
Python; you just use their name since they are provided in all
namespaces.

For instance, in a sandboxed interpreter, one only had to
``del __builtins__`` to gain access to the full set of built-ins.
Another way is through using the gc module:
``gc.get_referrers(''.__class__.__bases__[0])[6]['file']``.  While both
of these could be fixed (the former was a bug in 'rexec' that was fixed
and the latter could be handled by not allowing 'gc' to be imported),
they are examples of things that do not require proactive actions on
the part of the programmer in normal Python to gain access to a
resource.  This was an unfortunate side-effect of having all of that
wonderful reflection in Python.

There is also the issue that 'rexec' was written in Python which
provides its own problems based on reflection and the ability to modify
the code at run-time without security protection.

Much has been learned since 'rexec' was written about how Python tends
to be used and where security issues tend to appear.  Essentially
Python's dynamic nature does not lend itself very well to a security
implementation that does not require a constant checking of
permissions.

Threat Model
///////////////////////////////////////

Below is a list of what the security implementation assumes, along with
what section of this document that addresses that part of the security
model (if not already true in Python by default).  The term "bare" when
in regards to an interpreter means an interpreter that has not
performed a single import of a module.  Also, all comments refer to a
sandboxed interpreter unless otherwise explicitly stated.

This list does not address specifics such as how 'file' will be
protected or whether memory should be protected.  This list is meant to
make clear at a more basic level what the security model is assuming is
true.

* The Python interpreter itself is always trusted.
* The Python interpreter cannot be crashed by valid Python source code
  in a bare interpreter.
* Python source code is always considered safe.
* Python bytecode is always considered dangerous [`Hostile Bytecode`_].
* C extension modules are inherently considered dangerous
  [`Extension Module Importation`_].
    + Explicit trust of a C extension module is possible.
* Sandboxed interpreters running in the same process inherently cannot
  communicate with each other.
    + Communication through C extension modules is possible because of
      the technical need to share extension module instances between
      interpreters.
* Sandboxed interpreters running in the same process inherently cannot
  share objects.
    + Sharing objects through C extension modules is possible because
      of the technical need to share extension module instances between
      interpreters.
* When starting a sandboxed interpreter, it starts with a fresh
  built-in and global namespace that is not shared with the interpreter
  that started it.
* Objects in the default built-in namespace should be safe to use
  [`Reading/Writing Files`_, `Stdin, Stdout, and Stderr`_].
    + Either hide the dangerous ones or cripple them so they can cause
      no harm.

There are also some features that might be desirable, but are not being
addressed by this security model.

* Communication in any direction between an unprotected interpreter and
  a sandboxed interpreter it created.

The Proposed Approach
///////////////////////////////////////

In light of where 'rexec' succeeded and failed along with what is known
about the two main approaches to security and how Python tends to
operate, the following is a proposal on how to secure Python for
sandboxing.

Implementation Details
===============================

Support for sandboxed interpreters will require a compilation flag.
This allows the more common case of people not caring about protections
to not take a performance hit.  And even when Python is compiled for
sandboxed interpreter restrictions, when the running interpreter *is*
unprotected, there will be no accidental triggers of protections.  This
means that developers should be liberal with the security protections
without worrying about there being issues for interpreters that do not
need/want the protection.

At the Python level, the __sandboxed__ built-in will be set based on
whether the interpreter is sandboxed or not.  This will be set for
*all* interpreters, regardless of whether sandboxed interpreter support
was compiled in or not.

For setting what is to be protected, the PyThreadState for the
sandboxed interpreter must be passed in.  This makes the protection
very explicit and helps make sure you set protections for the exact
interpreter you mean to.  All functions that set protections begin with
the prefix ``PySandbox_Set*()``.  These functions are meant to only
work with sandboxed interpreters that have not been used yet to execute
any Python code.  The calls must be made by the code creating and
handling the sandboxed interpreter *before* the sandboxed interpreter
is used to execute any Python code.

The functions for checking for permissions are actually macros that
take in at least an error return value for the function calling the
macro.  This allows the macro to return on behalf of the caller if the
check fails and cause the SandboxError exception to be propagated
automatically.  This helps eliminate any coding errors from incorrectly
checking a return value on a rights-checking function call.  For the
rare case where this functionality is disliked, just make the check in
a utility function and check that function's return value (but this is
strongly discouraged!).

Functions that check that an operation is allowed implicitly operate on
the currently running interpreter as returned by
``PyInterpreter_Get()`` and are to be used by any code (the
interpreter, extension modules, etc.) that needs to check for
permission to execute.  They have the common prefix of
`PySandbox_Allowed*()``.

API
--------------

* PyThreadState* PySandbox_NewInterpreter()
    Return a new interpreter that is considered sandboxed.  There is no
    corresponding ``PySandbox_EndInterpreter()`` as
    ``Py_EndInterpreter()`` will be taught how to handle sandboxed
    interpreters.  ``NULL`` is returned on error.

* PySandbox_Allowed(error_return)
    Macro that has the caller return with 'error_return' if the
    interpreter is unprotected, otherwise do nothing.

Memory
=============================

Protection
--------------

A memory cap will be allowed.

Modification to pymalloc will be needed to properly keep track of the
allocation and freeing of memory.  Same goes for the macros around the
system malloc/free system calls.  This provides a platform-independent
system for protection of memory instead of relying on the operating
system to provide a service for capping memory usage of a process.  It
also allows the protection to be at the interpreter level instead of at
the process level.

Why
--------------

Protecting excessive memory usage allows one to make sure that a DoS
attack against the system's memory is prevented.

Possible Security Flaws
-----------------------

If code makes direct calls to malloc/free instead of using the proper
``PyMem_*()``
macros then the security check will be circumvented.  But C code is
*supposed* to use the proper macros or pymalloc and thus this issue is
not with the security model but with code not following Python coding
standards.

API
--------------

* int PySandbox_SetMemoryCap(PyThreadState *, integer)
    Set the memory cap for an sandboxed interpreter.  If the
    interpreter is not running an sandboxed interpreter, return a false
    value.

* PySandbox_AllowedMemoryAlloc(integer, error_return)
    Macro to increase the amount of memory that is reported that the
    running sandboxed interpreter is using.  If the increase puts the
    total count passed the set limit, raise an SandboxError exception
    and cause the calling function to return with the value of
    'error_return', otherwise do nothing.

* PySandbox_AllowedMemoryFree(integer, error_return)
    Macro to decrease the current running interpreter's allocated
    memory.  If this puts the memory used to below 0, raise a
    SandboxError exception and return 'error_return', otherwise do
    nothing.

Reading/Writing Files
=============================

Protection
--------------

XXX

To open a file, one will have to use open().  This will make open() a
factory function that controls reference access to the 'file' type in
terms of creating new instances.  When an attempted file opening fails
(either because the path does not exist or of security reasons),
SandboxError will be raised.  The same exception must be raised to
prevent filesystem information being gleaned from the type of exception
returned (i.e., returning IOError if a path does not exist tells the
user something about that file path).

What open() returns may not be an instance of 'file' but a proxy that
provides the security measures needed.  While this might break code
that uses type checking to make sure a 'file' object is used, taking a
duck typing approach would be better.  This is not only more Pythonic
but would also allow the code to use a StringIO instance.

It has been suggested to allow for a passed-in callback to be called
when a specific path is to be opened.  While this provides good
flexibility in terms of allowing custom proxies with more fine-grained
security (e.g., capping the amount of disk write), this has been deemed
unneeded in the initial security model and thus is not being considered
at this time.

Why
--------------

Allowing anyone to be able to arbitrarily read, write, or learn about
the layout of your filesystem is extremely dangerous.  It can lead to
loss of data or data being exposed to people whom should not have
access.

Possible Security Flaws
-----------------------

XXX

API
--------------

* int PySandbox_SetAllowedFile(PyThreadState *, string path,
                                string mode)
    Add a file that is allowed to be opened in 'mode' by the 'file'
    object.  If the interpreter is not sandboxed then return a false
    value.

* PySandbox_AllowedPath(string path, string mode, error_return)
    Macro that causes the caller to return with 'error_return' and
    raise SandboxError as the exception if the specified path with
    'mode' is not allowed, otherwise do nothing.

Extension Module Importation
============================

Protection
--------------

A whitelist of extension modules that may be imported must be provided.
A default set is given for stdlib modules known to be safe.

A check in the import machinery will check that a specified module name
is allowed based on the type of module (Python source, Python bytecode,
or extension module).  Python bytecode files are never directly
imported because of the possibility of hostile bytecode being present.
Python source is always considered safe based on the assumption that
all resource harm is eventually done at the C level, thus Python source
code directly cannot cause harm without help of C extension modules.
Thus only C extension modules need to be checked against the whitelist.

The requested extension module name is checked in order to make sure
that it is on the whitelist if it is a C extension module.  If the name
is not correct a SandboxError exception is raised.  Otherwise the
import is allowed.

Even if a Python source code module imports a C extension module in an
unprotected interpreter it is not a problem since the Python source
code module is reloaded in the sandboxed interpreter.  When that Python
source module is freshly imported the normal import check will be
triggered to prevent the C extension module from becoming available to
the sandboxed interpreter.

For the 'os' module, a special sandboxed version will be used if the
proper C extension module providing the correct abilities is not
allowed.  This will default to '/' as the path separator and provide as
much reasonable abilities as possible from a pure Python module.

The 'sys' module is specially addressed in
`Changing the Behaviour of the Interpreter`_.

By default, the whitelisted modules are:

* XXX

Why
--------------

Because C code is considered unsafe, its use should be regulated.  By
using a whitelist it allows one to explicitly decide that a C extension
module is considered safe.

Possible Security Flaws
-----------------------

If a whitelisted C extension module imports a non-whitelisted C
extension module and makes it an attribute of the whitelisted module
there will be a breach in security.  Luckily this a rarity in
extension modules.

There is also the issue of a C extension module calling the C API of a
non-whitelisted C extension module.

Lastly, if a whitelisted C extension module is loaded in an unprotected
interpreter and then loaded into a sandboxed interpreter then there is
no checks during module initialization for possible security issues in
the sandboxed interpreter that would have occurred had the sandboxed
interpreter done the initial import.

All of these issues can be handled by never blindly whitelisting a C
extension module.  Added support for dealing with C extension modules
comes in the form of `Extension Module Crippling`_.

API
--------------

* int PySandbox_SetModule(PyThreadState *, string module_name)
    Allow the sandboxed interpreter to import 'module_name'.  If the
    interpreter is not sandboxed, return a false value.  Absolute
    import paths must be specified.

* int PySandbox_BlockModule(PyThreadState *, string module_name)
    Remove the specified module from the whitelist.  Used to remove
    modules that are allowed by default.  Return a false value if
    called on an unprotected interpreter.

* PySandbox_AllowedModule(string module_name, error_return)
    Macro that causes the caller to return with 'error_return' and sets
    the exception SandboxError if the specified module cannot be
    imported, otherwise does nothing.

Extension Module Crippling
==========================

Protection
--------------

By providing a C API for checking for allowed abilities, modules that
have some useful functionality can do proper security checks for those
functions that could provide insecure abilities while allowing safe
code to be used (and thus not fully deny importation).

Why
--------------

Consider a module that provides a string processing ability.  If that
module provides a single convenience function that reads its input
string from a file (with a specified path), the whole module should not
be blocked from being used, just that convenience function.  By
whitelisting the module but having a security check on the one problem
function, the user can still gain access to the safe functions.  Even
better, the unsafe function can be allowed if the security checks pass.

Possible Security Flaws
-----------------------

If a C extension module developer incorrectly implements the security
checks for the unsafe functions it could lead to undesired abilities.

API
--------------

Use PySandbox_Allowed() to protect unsafe code from being executed.

Hostile Bytecode
=============================

Protection
--------------

XXX

Why
--------------

Without implementing a bytecode verification tool, there is no way of
making sure that bytecode does not jump outside its bounds, thus
possibly executing malicious code.  It also presents the possibility of
crashing the interpreter.

Possible Security Flaws
-----------------------

None known.

API
--------------

N/A

Changing the Behaviour of the Interpreter
=========================================

Protection
--------------

Only a subset of the 'sys' module will be made available to sandboxed
interpreters.  Things to allow from the sys module:

* byteorder (?)
* copyright
* displayhook
* excepthook
* __displayhook__
* __excepthook__
* exc_info
* exc_clear
* exit
* getdefaultencoding
* _getframe (?)
* hexversion
* last_type
* last_value
* last_traceback
* maxint (?)
* maxunicode (?)
* modules
* stdin  # See `Stdin, Stdout, and Stderr`_.
* stdout
* stderr
* version

Why
--------------

Filesystem information must be removed.  Any settings that could
possibly lead to a DoS attack (e.g., sys.setrecursionlimit()) or risk
crashing the interpreter must also be removed.

Possible Security Flaws
-----------------------

Exposing something that could lead to future security problems (e.g., a
way to crash the interpreter).

API
--------------

None.

Socket Usage
=============================

Protection
--------------

Allow sending and receiving data to/from specific IP addresses on
specific ports.

open() is to be used as a factory function to open a network
connection.  If the connection is not possible (either because of an
invalid address or security reasons), SandboxError is raised.

A socket object may not be returned by the call.  A proxy to handle
security might be returned instead.

XXX

Why
--------------

Allowing arbitrary sending of data over sockets can lead to DoS attacks
on the network and other machines.  Limiting accepting data prevents
your machine from being attacked by accepting malicious network
connections.  It also allows you to know exactly where communication is
going to and coming from.

Possible Security Flaws
-----------------------

If someone managed to influence the used DNS server to influence what
IP addresses were used after a DNS lookup.

API
--------------

* int PySandbox_SetIPAddress(PyThreadState *, string IP, integer port)
    Allow the sandboxed interpreter to send/receive to the specified
    'IP' address on the specified 'port'.  If the interpreter is not
    sandboxed, return a false value.

* PySandbox_AllowedIPAddress(string IP, integer port, error_return)
    Macro to verify that the specified 'IP' address on the specified
    'port' is allowed to be communicated with.  If not, cause the
    caller to return with 'error_return' and SandboxError exception
    set, otherwise do nothing.

* int PySandbox_SetHost(PyThreadState *, string host, integer port)
    Allow the sandboxed interpreter to send/receive to the specified
    'host' on the specified 'port'.  If the interpreter is not
    sandboxed, return a false value.

* PySandbox_AllowedHost(string host, integer port, error_return)
    Check that the specified 'host' on the specified 'port' is allowed
    to be communicated with.  If not, set a SandboxError exception and
    cause the caller to return 'error_return', otherwise do nothing.

Network Information
=============================

Protection
--------------

Limit what information can be gleaned about the network the system is
running on.  This does not include restricting information on IP
addresses and hosts that are have been explicitly allowed for the
sandboxed interpreter to communicate with.

XXX

Why
--------------

With enough information from the network several things could occur.
One is that someone could possibly figure out where your machine is on
the Internet.  Another is that enough information about the network you
are connected to could be used against it in an attack.

Possible Security Flaws
-----------------------

As long as usage is restricted to only what is needed to work with
allowed addresses, there are no security issues to speak of.

API
--------------

* int PySandbox_SetNetworkInfo(PyThreadState *)
    Allow the sandboxed interpreter to get network information
    regardless of whether the IP or host address is explicitly allowed.
    If the interpreter is not sandboxed, return a false value.

* PySandbox_AllowedNetworkInfo(error_return)
    Macro that will return 'error_return' for the caller and set a
    SandboxError exception if the sandboxed interpreter does not allow
    checking for arbitrary network information, otherwise do nothing.

Filesystem Information
=============================

Protection
--------------

Do not allow information about the filesystem layout from various parts
of Python to be exposed.  This means blocking exposure at the Python
level to:

* __file__ attribute on modules
* __path__ attribute on packages
* co_filename attribute on code objects
* XXX

Why
--------------

Exposing information about the filesystem is not allowed.  You can
figure out what operating system one is on which can lead to
vulnerabilities specific to that operating system being exploited.

Possible Security Flaws
-----------------------

Not finding every single place where a file path is exposed.

API
--------------

* int PySandbox_SetFilesystemInfo(PyThreadState *)
    Allow the sandboxed interpreter to expose filesystem information.
    If the passed-in interpreter is not sandboxed, return NULL.

* PySandbox_AllowedFilesystemInfo(error_return)
    Macro that checks if exposing filesystem information is allowed.
    If it is not, cause the caller to return with the value of
    'error_return' and raise SandboxError, otherwise do nothing.

Stdin, Stdout, and Stderr
=============================

Protection
--------------

By default, sys.__stdin__, sys.__stdout__, and sys.__stderr__ will be
set to instances of StringIO.  Explicit allowance of the process'
stdin, stdout, and stderr is possible.

This will protect the 'print' statement, and the built-ins input() and
raw_input().

Why
--------------

Interference with stdin, stdout, or stderr should not be allowed unless
desired.  No one wants uncontrolled output sent to their screen.

Possible Security Flaws
-----------------------

Unless StringIO instances can be used maliciously, none to speak of.

API
--------------

* int PySandbox_SetTrueStdin(PyThreadState *)
  int PySandbox_SetTrueStdout(PyThreadState *)
  int PySandbox_SetTrueStderr(PyThreadState *)
    Set the specific stream for the interpreter to the true version of
    the stream and not to the default instance of StringIO.  If the
    interpreter is not sandboxed, return a false value.

Adding New Protections
=============================

.. note:: This feature has the lowest priority and thus will be the
          last feature implemented (if ever).

Protection
--------------

Allow for extensibility in the security model by being able to add new
types of checks.  This allows not only for Python to add new security
protections in a backwards-compatible fashion, but to also have
extension modules add their own as well.

An extension module can introduce a group for its various values to
check, with a type being a specific value within a group.  The "Python"
group is specifically reserved for use by the Python core itself.

Why
--------------

We are all human.  There is the possibility that a need for a new type
of protection for the interpreter will present itself and thus need
support.  By providing an extensible way to add new protections it
helps to future-proof the system.

It also allows extension modules to present their own set of security
protections.  That way one extension module can use the protection
scheme presented by another that it is dependent upon.

Possible Security Flaws
------------------------

Poor definitions by extension module users of how their protections
should be used would allow for possible exploitation.

API
--------------

+ Bool
    * int PySandbox_SetExtendedFlag(PyThreadState *, string group,
                                    string type)
        Set a group-type to be true.  Expected use is for when a binary
        possibility of something is needed and that the default is to
        not allow use of the resource (e.g., network information).
        Returns a false value if used on an unprotected interpreter.

    * PySandbox_AllowedExtendedFlag(string group, string type,
                                    error_return)
        Macro that if the group-type is not set to true, cause the
        caller to return with 'error_return' with SandboxError
        exception raised.  For unprotected interpreters the check does
        nothing.

+ Numeric Range
    * int PySandbox_SetExtendedCap(PyThreadState *, string group,
                                    string type, integer cap)
        Set a group-type to a capped value, 'cap', with the initial
        allocated value set to 0.  Expected use is when a resource has
        a capped amount of use (e.g., memory).  Returns a false value
        if the interpreter is not sandboxed.

    * PySandbox_AllowedExtendedAlloc(integer increase, error_return)
        Macro to raise the amount of a resource is used by 'increase'.
        If the increase pushes the resource allocation past the set
        cap, then return 'error_return' and set SandboxError as the
        exception, otherwise do nothing.

    * PySandbox_AllowedExtendedFree(integer decrease, error_return)
        Macro to lower the amount a resource is used by 'decrease'.  If
        the decrease pushes the allotment to below 0 then have the
        caller return 'error_return' and set SandboxError as the
        exception, otherwise do nothing.

+ Membership
    * int PySandbox_SetExtendedMembership(PyThreadState *,
                                            string group, string type,
                                            string member)
        Add a string, 'member',  to be considered a member of a
        group-type (e.g., allowed file paths).  If the interpreter is not
        an sandboxed interpreter, return a false value.

    * PySandbox_AllowedExtendedMembership(string group, string type,
                                            string member,
                                            error_return)
        Macro that checks 'member' is a member of the values set for
        the group-type.  If it is not, then have the caller return
        'error_return' and set an exception for SandboxError, otherwise
        does nothing.

+ Specific Value
    * int PySandbox_SetExtendedValue(PyThreadState *, string group,
                                        string type, string value)
        Set a group-type to 'value'.  If the interpreter is not
        sandboxed, return NULL.

    * PySandbox_AllowedExtendedValue(string group, string type,
                                        string value, error_return)
        Macro to check that the group-type is set to 'value'.  If it is
        not, then have the caller return 'error_return' and set an
        exception for SandboxError, otherwise do nothing.

Python API
=============================

__sandboxed__
--------------

A built-in that flags whether the interpreter currently running is
sandboxed or not.  Set to a 'bool' value that is read-only.  To mimic
working of __debug__.

sandbox module
--------------

XXX

References
///////////////////////////////////////

.. [#rexec] The 'rexec' module
   (http://docs.python.org/lib/module-rexec.html)

.. [#safe-tcl] The Safe-Tcl Security Model
   (http://research.sun.com/technical-reports/1997/abstract-60.html)

.. [#ctypes] 'ctypes' module
   (http://docs.python.org/dev/lib/module-ctypes.html)

.. [#paradigm regained] "Paradigm Regained:
                         Abstraction Mechanisms for Access Control"
   (http://erights.org/talks/asian03/paradigm-revised.pdf)

.. [#armin-hiding] [Python-Dev] what can we do to hide the 'file' type?
   (http://mail.python.org/pipermail/python-dev/2006-July/067076.html)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060707/a32678f1/attachment-0001.html