[Python-3000-checkins] r67008 - in python/branches/py3k/Doc: includes/dbpickle.py library/pickle.rst

alexandre.vassalotti python-3000-checkins at python.org
Fri Oct 24 03:32:41 CEST 2008

Author: alexandre.vassalotti
Date: Fri Oct 24 03:32:40 2008
New Revision: 67008

More improvements to pickle's documentation.
Add "Restricting Globals" section.
Remove useless 'verbose' flag in the example dbpickle.py.


Modified: python/branches/py3k/Doc/includes/dbpickle.py
--- python/branches/py3k/Doc/includes/dbpickle.py	(original)
+++ python/branches/py3k/Doc/includes/dbpickle.py	Fri Oct 24 03:32:40 2008
@@ -46,7 +46,7 @@
             raise pickle.UnpicklingError("unsupported persistent object")
-def main(verbose=True):
+def main():
     import io, pprint
     # Initialize and populate our database.
@@ -68,20 +68,18 @@
     file = io.BytesIO()
-    if verbose:
-        print("Records to be pickled:")
-        pprint.pprint(memos)
+    print("Pickled records:")
+    pprint.pprint(memos)
     # Update a record, just for good measure.
     cursor.execute("UPDATE memos SET task='learn italian' WHERE key=1")
-    # Load the reports from the pickle data stream.
+    # Load the records from the pickle data stream.
     memos = DBUnpickler(file, conn).load()
-    if verbose:
-        print("Unpickled records:")
-        pprint.pprint(memos)
+    print("Unpickled records:")
+    pprint.pprint(memos)
 if __name__ == '__main__':

Modified: python/branches/py3k/Doc/library/pickle.rst
--- python/branches/py3k/Doc/library/pickle.rst	(original)
+++ python/branches/py3k/Doc/library/pickle.rst	Fri Oct 24 03:32:40 2008
@@ -111,15 +111,17 @@
   bytes and cannot be unpickled by Python 2.x pickle modules.  This is
   the current recommended protocol, use it whenever it is possible.
-Refer to :pep:`307` for more information.
+Refer to :pep:`307` for information about improvements brought by
+protocol 2.  See :mod:`pickletools`'s source code for extensive
+comments about opcodes used by pickle protocols.
 If a *protocol* is not specified, protocol 3 is used.  If *protocol* is
 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
 protocol version available will be used.
+Module Interface
 To serialize an object hierarchy, you first create a pickler, then you call the
 pickler's :meth:`dump` method.  To de-serialize a data stream, you first create
@@ -347,10 +349,13 @@
    .. method:: find_class(module, name)
       Import *module* if necessary and return the object called *name* from it,
-      where the *module* and *name* arguments are :class:`str` objects.
+      where the *module* and *name* arguments are :class:`str` objects.  Note,
+      unlike its name suggests, :meth:`find_class` is also used for finding
+      functions.
       Subclasses may override this to gain control over what type of objects and
-      how they can be loaded, potentially reducing security risks.
+      how they can be loaded, potentially reducing security risks. Refer to
+      :ref:`pickle-restrict` for details.
 .. _pickle-picklable:
@@ -424,7 +429,7 @@
 your objects are serialized and de-serialized.  The description in this section
 doesn't cover specific customizations that you can employ to make the unpickling
 environment slightly safer from untrusted pickle data streams; see section
-:ref:`pickle-sub` for more details.
+:ref:`pickle-restrict` for more details.
 .. _pickle-inst:
@@ -600,41 +605,85 @@
+.. XXX Work around for some bug in sphinx/pygments.
 .. highlightlang:: python
 .. literalinclude:: ../includes/dbpickle.py
+.. highlightlang:: python3
+.. _pickle-restrict:
-.. _pickle-sub:
-Subclassing Unpicklers
+Restricting Globals
 .. index::
-   single: load_global() (pickle protocol)
-   single: find_global() (pickle protocol)
+   single: find_class() (pickle protocol)
+By default, unpickling will import any class or function that it finds in the
+pickle data.  For many applications, this behaviour is unacceptable as it
+permits the unpickler to import and invoke arbitrary code.  Just consider what
+this hand-crafted pickle data stream does when loaded::
+    >>> import pickle
+    >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
+    hello world
+    0
+In this example, the unpickler imports the :func:`os.system` function and then
+apply the string argument "echo hello world".  Although this example is
+inoffensive, it is not difficult to imagine one that could damage your system.
+For this reason, you may want to control what gets unpickled by customizing
+:meth:`Unpickler.find_class`.  Unlike its name suggests, :meth:`find_class` is
+called whenever a global (i.e., a class or a function) is requested.  Thus it is
+possible to either forbid completely globals or restrict them to a safe subset.
-By default, unpickling will import any class that it finds in the pickle data.
-You can control exactly what gets unpickled and what gets called by customizing
-your unpickler.
-You need to derive a subclass from :class:`Unpickler`, overriding the
-:meth:`load_global` method.  :meth:`load_global` should read two lines from the
-pickle data stream where the first line will the name of the module containing
-the class and the second line will be the name of the instance's class.  It then
-looks up the class, possibly importing the module and digging out the attribute,
-then it appends what it finds to the unpickler's stack.  Later on, this class
-will be assigned to the :attr:`__class__` attribute of an empty class, as a way
-of magically creating an instance without calling its class's
-:meth:`__init__`. Your job (should you choose to accept it), would be to have
-:meth:`load_global` push onto the unpickler's stack, a known safe version of any
-class you deem safe to unpickle.  It is up to you to produce such a class.  Or
-you could raise an error if you want to disallow all unpickling of instances.
-If this sounds like a hack, you're right.  Refer to the source code to make this
+Here is an example of an unpickler allowing only few safe classes from the
+:mod:`builtins` module to be loaded::
-The moral of the story is that you should be really careful about the source of
-the strings your application unpickles.
+   import builtins
+   import io
+   import pickle
+   safe_builtins = {
+       'range',
+       'complex',
+       'set',
+       'frozenset',
+       'slice',
+   }
+   class RestrictedUnpickler(pickle.Unpickler):
+       def find_class(self, module, name):
+           # Only allow safe classes from builtins.
+           if module == "builtins" and name in safe_builtins:
+               return getattr(builtins, name)
+           # Forbid everything else.
+           raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
+                                        (module, name))
+   def restricted_loads(s):
+       """Helper function analogous to pickle.loads()."""
+       return RestrictedUnpickler(io.BytesIO(s)).load()
+A sample usage of our unpickler working has intended::
+    >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
+    [1, 2, range(0, 15)]
+    >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
+    Traceback (most recent call last):
+      ...
+    pickle.UnpicklingError: global 'os.system' is forbidden
+    >>> restricted_loads(b'cbuiltins\neval\n'
+    ...                  b'(S\'getattr(__import__("os"), "system")'
+    ...                  b'("echo hello world")\'\ntR.')
+    Traceback (most recent call last):
+      ...
+    pickle.UnpicklingError: global 'builtins.eval' is forbidden
+As our examples shows, you have to be careful with what you allow to
+be unpickled.  Therefore if security is a concern, you may want to consider
+alternatives such as the marshalling API in :mod:`xmlrpc.client` or
+third-party solutions.
 .. _pickle-example:
@@ -769,7 +818,7 @@
 .. [#] This protocol is also used by the shallow and deep copying operations
    defined in the :mod:`copy` module.
-.. [#] The limitation on alphanumeric characters is due to the fact the
-   persistent IDs, in protocol 0, are delimited by the newline character.
-   Therefore if any kind of newline characters, such as \r and \n, occurs in
+.. [#] The limitation on alphanumeric characters is due to the fact
+   the persistent IDs, in protocol 0, are delimited by the newline
+   character.  Therefore if any kind of newline characters occurs in
    persistent IDs, the resulting pickle will become unreadable.

More information about the Python-3000-checkins mailing list