[Python-Dev] new security doc using object-capabilities

Thu Sep 7 05:38:07 CEST 2006

Hi Brett,

Here are some comments on your proposal.  Sorry this took so long.
I apologize if any of these comments are out of date (but also look
forward to your answers to some of the questions, as they'll help
me understand some more of the details of your proposal).  Thanks!

> Introduction
> ///////////////////////////////////////
[...]
> Throughout this document several terms are going to be used.  A
> "sandboxed interpreter" is one where the built-in namespace is not the
> same as that of an interpreter whose built-ins were unaltered, which
> is called an "unprotected interpreter".

Is this a definition or an implementation choice?  As in, are you
defining "sandboxed" to mean "with altered built-ins" or just
"restricted in some way", and does the above mean to imply that
altering the built-ins is what triggers other kinds of restrictions
(as it did in Python's old restricted execution mode)?

> A "bare interpreter" is one where the built-in namespace has been
> stripped down the bare minimum needed to run any form of basic Python
> program.  This means that all atomic types (i.e., syntactically
> supported types), ``object``, and the exceptions provided by the
> ``exceptions`` module are considered in the built-in namespace.  There
> have also been no imports executed in the interpreter.

Is a "bare interpreter" just one example of a sandboxed interpreter,
or are all sandboxed interpreters in your design initially bare (i.e.
"sandboxed" = "bare" + zero or more granted authorities)?

> The "security domain" is the boundary at which security is cared
> about.  For this dicussion, it is the interpreter.

It might be clearer to say (if i understand correctly) "Each interpreter
is a separate security domain."

Many interpreters can run within a single operating system process,
right?  Could you say a bit about what sort of concurrency model you
have in mind?  How would this interact (if at all) with use of the
existing threading functionality?

> The "powerbox" is the thing that possesses the ultimate power in the
> system.  In our case it is the Python process.

This could also be the application process, right?

> Rationale
> ///////////////////////////////////////
[...]
> For instance, think of an application that supports a plug-in system
> with Python as the language used for writing plug-ins.  You do not
> want to have to examine every plug-in you download to make sure that
> it does not alter your filesystem if you can help it.  With a proper
> security model and implementation in place this hinderance of having
> to examine all code you execute should be alleviated.

I'm glad to have this use case set out early in the document, so the
reader can keep it in mind as an example while reading about the model.

> Approaches to Security
> ///////////////////////////////////////
>
> There are essentially two types of security: who-I-am
> (permissions-based) security and what-I-have (authority-based)
> security.

As Mark Miller mentioned in another message, your descriptions of
"who-I-am" security and "what-I-have" security make sense, but
they don't correspond to "permission" vs. "authority".  They
correspond to "identity-based" vs. "authority-based" security.

> Difficulties in Python for Object-Capabilities
> //////////////////////////////////////////////
[...]
> Three key requirements for providing a proper perimeter defence is
> private namespaces, immutable shared state across domains, and
> unforgeable references.

Nice summary.

> Problem of No Private Namespace
> ===============================
[...]
> The Python language has no such thing as a private namespace.

Don't local scopes count as private namespaces?  It seems clear
that they aren't designed with the intention of being exposed,
unlike other namespaces in Python.

> It also makes providing security at the object level using
> object-capabilities non-existent in pure Python code.

I don't think this is necessarily the case.  No Python code i've
ever seen expects to be able to invade the local scopes of other
functions, so you could use them as private namespaces.  There
are two ways i've seen to invade local scopes:

    (a) Use gc.get_referents to get back from a cell object
        to its contents.

    (b) Compare the cell object to another cell object, thereby
        causing __eq__ to be invoked to compare the contents of
        the cells.

So you could protect local scopes by prohibiting these or by
simply turning off access to func_closure.  It's clear that hardly
any code depends on these introspection featuresl, so it would be
reasonble to turn them off in a sandboxed interpreter.  (It seems
you would have to turn off some introspection features anyway in
order to have reliable import guards.)

> Problem of Mutable Shared State
> ===============================
[...]
> Regardless, sharing of state that can be influenced by another
> interpreter is not safe for object-capabilities.

Yup.

> Threat Model
> ///////////////////////////////////////

Good to see this specified here.  I like the way you've broken this
down.

> * An interpreter cannot gain abilties the Python process possesses
>   without explicitly being given those abilities.

It would be good to enumerate which abilities you're referring to in
this item.  For example, a bare interpreter should be able to allocate
memory and call most of the built-in functions, but should not be able
to open network connections.

> * An interpreter cannot influence another interpreter directly at the
>   Python level without explicitly allowing it.

You mean, without some other entity explicitly allowing it, right?
What would that other entity be -- presumably the interpreter that
spawned both of these sub-interpreters?

> * An interpreter cannot use operating system resources without being
>   explicitly given those resources.

Okay.

> * A bare Python interpreter is always trusted.

What does "trusted" mean in the above?

> * Python bytecode is always distrusted.
> * Pure Python source code is always safe on its own.

It would be helpful to clarify "safe" here.  I assume by "safe" you
mean that the Python source code can express whatever it wants,
including potentially dangerous activities, but when run in a bare
or sandboxed interpreter it cannot have harmful effects.  But then
in what sense does the "safety" have to do with the Python source code
rather than the restrictions on the interpreter?

Would it be correct to say:
  + We want to guarantee that Python source code cannot violate
    the restrictions in a restricted or bare interpreter.
  + We do not prevent arbitrary Python bytecode from violating
    these restrictions, and assume that it can.

>     + Malicious abilities are derived from C extension modules,
>       built-in modules, and unsafe types implemented in C, not from
>       pure Python source.

By "malicious" do you just mean "anything that isn't accessible to
a bare interpreter"?

> * A sub-interpreter started by another interpreter does not inherit
>   any state.

Do you envision a tree of interpreters and sub-interpreters?  Can the
levels of spawning get arbitrarily deep?

If i am visualizing your model correctly, maybe it would be useful to
introduce the term "parent", where each interpreter has as its parent
either the Python process or another interpreter.  Then you could say
that each interpreter acquires authority only by explicit granting from
its parent.  Then i have another question: can an interpreter acquire
authorities only when it is started, or can it acquire them while it is
running, and how?

> Implementation
> ///////////////////////////////////////
>
> Guiding Principles
> ========================
>
> To begin, the Python process garners all power as the powerbox.  It is
> up to the process to initially hand out access to resources and
> abilities to interpreters.  This might take the form of an interpreter
> with all abilities granted (i.e., a standard interpreter as launched
> when you execute Python), which then creates sub-interpreters with
> sandboxed abilities.  Another alternative is only creating
> interpreters with sandboxed abilities (i.e., Python being embedded in
> an application that only uses sandboxed interpreters).

This sounds like part of your design to me.  It might help to have
this earlier in the document (maybe even with an example diagram of a
tree of interpreters).

> All security measures should never have to ask who an interpreter is.
> This means that what abilities an interpreter has should not be stored
> at the interpreter level when the security can use a proxy to protect
> a resource.  This means that while supporting a memory cap can
> have a per-interpreter setting that is checked (because access to the
> operating system's memory allocator is not supported at the program
> level), protecting files and imports should not such a per-interpreter
> protection at such a low level (because those can have extension
> module proxies to provide the security).

It might be good to declare two categories of resources -- those
protected by object hiding and those protected by a per-interpreter
setting -- and make lists.

> Backwards-compatibility will not be a hindrance upon the design or
> implementation of the security model.  Because the security model will
> inherently remove resources and abilities that existing code expects,
> it is not reasonable to expect existing code to work in a sandboxed
> interpreter.

You might qualify the last statement a bit.  For example, a Python
implementation of a pure algorithm (e.g. string processing, data
compression, etc.) would still work in a sandboxed interpreter.

> Keeping Python "pythonic" is required for all design decisions.

As Lawrence Oluyede also mentioned, it would be helpful to say a
little more about what "pythonic" means.

> Restricting what is in the built-in namespace and the safe-guarding
> the interpreter (which includes safe-guarding the built-in types) is
> where security will come from.

Sounds good.

> Abilities of a Standard Sandboxed Interpreter
> =============================================
>
[...]
> * You cannot open any files directly.
> * Importation
>     + You can import any pure Python module.
>     + You cannot import any Python bytecode module.
>     + You cannot import any C extension module.
>     + You cannot import any built-in module.
> * You cannot find out any information about the operating system you
>   are running on.
> * Only safe built-ins are provided.

This looks reasonable.  This is probably a good place to itemize
exactly which built-ins are considered safe.

> Imports
> -------
>
> A proxy for protecting imports will be provided.  This is done by
> setting the ``__import__()`` function in the built-in namespace of the
> sandboxed interpreter to a proxied version of the function.
>
> The planned proxy will take in a passed-in function to use for the
> import and a whitelist of C extension modules and built-in modules to
> allow importation of.

Presumably these are passed in to the proxy's constructor.

> If an import would lead to loading an extension
> or built-in module, it is checked against the whitelist and allowed
> to be imported based on that list.  All .pyc and .pyo file will not
> be imported.  All .py files will be imported.

I'm unclear about this.  Is the whitelist a list of module names only,
or of filenames with extensions?  Does the normal path-searching process
take place or can it be restricted in some way?  Would it simplify the
security analysis to have the whitelist be a dictionary that maps module
names to absolute pathnames?

If both the .py and .pyc are present, the normal import would find the
.pyc file; would the import proxy reject such an import or ignore it
and recompile the .py instead?

> It must be warned that importing any C extension module is dangerous.

Right.

> Implementing Import in Python
> +++++++++++++++++++++++++++++
>
> To help facilitate in the exposure of more of what importation
> requires (and thus make implementing a proxy easier), the import
> machinery should be rewritten in Python.

This seems like a good idea.  Can you identify which minimum essential
pieces of the import machinery have to be written in C?

> Sanitizing Built-In Types
> -------------------------
[...]
> Constructors
> ++++++++++++
>
> Almost all of Python's built-in types
> contain a constructor that allows code to create a new instance of a
> type as long as you have the type itself.  Unfortunately this does not
> work in an object-capabilities system without either providing a proxy
> to the constructor or just turning it off.

The existence of the constructor isn't (by itself) the problem.
The problem is that both of the following are true:

    (a) From any object you can get its type object.
    (b) Using any type object you can construct a new instance.

So, you can control this either by hiding the type object, separating
the constructor from the type, or disabling the constructor.

> Types whose constructors are considered dangerous are:
>
> * ``file``
>     + Will definitely use the ``open()`` built-in.
> * code objects
> * XXX sockets?
> * XXX type?
> * XXX

Looks good so far.  Not sure i see what's dangerous about 'type'.

> Filesystem Information
> ++++++++++++++++++++++
>
> When running code in a sandboxed interpreter, POLA suggests that you
> do not want to expose information about your environment on top of
> protecting its use.  This means that filesystem paths typically should
> not be exposed.  Unfortunately, Python exposes file paths all over the
> place:
>
> * Modules
>     + ``__file__`` attribute
> * Code objects
>     + ``co_filename`` attribute
> * Packages
>     + ``__path__`` attribute
> * XXX
>
> XXX how to expose safely?

It seems that in most cases, a single Python object is associated with
a single pathname.  If that's true in general, one solution would be
to provide an introspection function named 'getpath' or something
similar that would get the path associated with any object.  This
function might go in a module containing all the introspection functions,
so imports of that module could be easily restricted.

> Mutable Shared State
> ++++++++++++++++++++
>
> Because built-in types are shared between interpreters, they cannot
> expose any mutable shared state.  Unfortunately, as it stands, some
> do.  Below is a list of types that share some form of dangerous state,
> how they share it, and how to fix the problem:
>
> * ``object``
>     + ``__subclasses__()`` function
>         - Remove the function; never seen used in real-world code.
> * XXX

Okay, more to work out here. :)

> Perimeter Defences Between a Created Interpreter and Its Creator
> ----------------------------------------------------------------
>
> The plan is to allow interpreters to instantiate sandboxed
> interpreters safely.  By using the creating interpreter's abilities to
> provide abilities to the created interpreter, you make sure there is
> no escalation in abilities.

Good.

> * ``__del__`` created in sandboxed interpreter but object is cleaned
>   up in unprotected interpreter.

How do you envision the launching of a sandboxed interpreter to look?
Could you sketch out some rough code examples?  Were you thinking of
something like:

    sys.spawn(code, dict)
        code: a string containing Python source code
        dict: the global namespace in which to run the code

If you allow the parent interpreter to pass mutable objects into the
child interpreter, then the parent and child can already communicate
via the object, so '__del__' is a moot issue.  Do you want to prevent
all communication between parent and child?  It's not obvious to me
why that would be necessary.

> * Using frames to walk the frame stack back to another interpreter.

Could you just disable introspection of the frame stack?

> Making the ``sys`` Module Safe
> ------------------------------
[...]
> This means that the ``sys`` module needs to have its safe information
> separated out from the unsafe settings.

Yes.

> XXX separate modules, ``sys.settings`` and ``sys.info``, or strip
> ``sys`` to settings and put info somewhere else?  Or provide a method
> that will create a faked sys module that has the safe values copied
> into it?

I think the last suggestion above would lead to confusion.  The two
groups should have two distinct names and it should be clear which
attribute goes with which group.

> Protecting I/O
> ++++++++++++++
>
> The ``print`` keyword and the built-ins ``raw_input()`` and
> ``input()`` use the values stored in ``sys.stdout`` and ``sys.stdin``.
> By exposing these attributes to the creating interpreter, one can set
> them to safe objects, such as instances of ``StringIO``.

Sounds good.

> Safe Networking
> ---------------
>
> XXX proxy on socket module, modify open() to be the constructor, etc.

Lots more to think about here. :)

> Protecting Memory Usage
> -----------------------
>
> To protect memory, low-level hooks into the memory allocator for
> Python is needed.  By hooking into the C API for memory allocation and
> deallocation a very rough running count of used memory can kept.  This
> can be used to prevent sandboxed interpreters from using so much
> memory that it impacts the overall performance of the system.

Preventing denial-of-service is in general quite difficult, but i
applaud the attempt.  I agree with your decision to separate this
work from the rest of the security model.

-- ?!ng