[Python-3000-checkins] r66971 - in python/branches/py3k/Doc: includes/dbpickle.py library/pickle.rst

alexandre.vassalotti python-3000-checkins at python.org
Sat Oct 18 22:47:58 CEST 2008


Author: alexandre.vassalotti
Date: Sat Oct 18 22:47:58 2008
New Revision: 66971

Log:
Improve pickle's documentation.
Use double-space for ending a sentence.
Add dbpickle.py example.
Improve description about persistent IDs.


Added:
   python/branches/py3k/Doc/includes/dbpickle.py
Modified:
   python/branches/py3k/Doc/library/pickle.rst

Added: python/branches/py3k/Doc/includes/dbpickle.py
==============================================================================
--- (empty file)
+++ python/branches/py3k/Doc/includes/dbpickle.py	Sat Oct 18 22:47:58 2008
@@ -0,0 +1,88 @@
+# Simple example presenting how persistent ID can be used to pickle
+# external objects by reference.
+
+import pickle
+import sqlite3
+from collections import namedtuple
+
+# Simple class representing a record in our database.
+MemoRecord = namedtuple("MemoRecord", "key, task")
+
+class DBPickler(pickle.Pickler):
+
+    def persistent_id(self, obj):
+        # Instead of pickling MemoRecord as a regular class instance, we emit a
+        # persistent ID instead.
+        if isinstance(obj, MemoRecord):
+            # Here, our persistent ID is simply a tuple containing a tag and a
+            # key which refers to a specific record in the database.
+            return ("MemoRecord", obj.key)
+        else:
+            # If obj does not have a persistent ID, return None. This means obj
+            # needs to be pickled as usual.
+            return None
+
+
+class DBUnpickler(pickle.Unpickler):
+
+    def __init__(self, file, connection):
+        super().__init__(file)
+        self.connection = connection
+
+    def persistent_load(self, pid):
+        # This method is invoked whenever a persistent ID is encountered.
+        # Here, pid is the tuple returned by DBPickler.
+        cursor = self.connection.cursor()
+        type_tag, key_id = pid
+        if type_tag == "MemoRecord":
+            # Fetch the referenced record from the database and return it.
+            cursor.execute("SELECT * FROM memos WHERE key=?", (str(key_id),))
+            key, task = cursor.fetchone()
+            return MemoRecord(key, task)
+        else:
+            # Always raises an error if you cannot return the correct object.
+            # Otherwise, the unpickler will think None is the object referenced
+            # by the persistent ID.
+            raise pickle.UnpicklingError("unsupported persistent object")
+
+
+def main(verbose=True):
+    import io, pprint
+
+    # Initialize and populate our database.
+    conn = sqlite3.connect(":memory:")
+    cursor = conn.cursor()
+    cursor.execute("CREATE TABLE memos(key INTEGER PRIMARY KEY, task TEXT)")
+    tasks = (
+        'give food to fish',
+        'prepare group meeting',
+        'fight with a zebra',
+        )
+    for task in tasks:
+        cursor.execute("INSERT INTO memos VALUES(NULL, ?)", (task,))
+
+    # Fetch the records to be pickled.
+    cursor.execute("SELECT * FROM memos")
+    memos = [MemoRecord(key, task) for key, task in cursor]
+    # Save the records using our custom DBPickler.
+    file = io.BytesIO()
+    DBPickler(file).dump(memos)
+
+    if verbose:
+        print("Records to be pickled:")
+        pprint.pprint(memos)
+
+    # Update a record, just for good measure.
+    cursor.execute("UPDATE memos SET task='learn italian' WHERE key=1")
+
+    # Load the reports from the pickle data stream.
+    file.seek(0)
+    memos = DBUnpickler(file, conn).load()
+
+    if verbose:
+        print("Unpickled records:")
+        pprint.pprint(memos)
+
+
+if __name__ == '__main__':
+    main()

Modified: python/branches/py3k/Doc/library/pickle.rst
==============================================================================
--- python/branches/py3k/Doc/library/pickle.rst	(original)
+++ python/branches/py3k/Doc/library/pickle.rst	Sat Oct 18 22:47:58 2008
@@ -27,7 +27,7 @@
 ------------------------------------
 
 The :mod:`pickle` module has an transparent optimizer (:mod:`_pickle`) written
-in C. It is used whenever available. Otherwise the pure Python implementation is
+in C.  It is used whenever available.  Otherwise the pure Python implementation is
 used.
 
 Python has a more primitive serialization module called :mod:`marshal`, but in
@@ -108,7 +108,7 @@
   efficient pickling of :term:`new-style class`\es.
 
 * Protocol version 3 was added in Python 3.0.  It has explicit support for
-  bytes and cannot be unpickled by Python 2.x pickle modules. This is
+  bytes and cannot be unpickled by Python 2.x pickle modules.  This is
   the current recommended protocol, use it whenever it is possible.
 
 Refer to :pep:`307` for more information.
@@ -166,7 +166,7 @@
    Python needed to read the pickle produced.
 
    The *file* argument must have a write() method that accepts a single bytes
-   argument. It can thus be a file object opened for binary writing, a
+   argument.  It can thus be a file object opened for binary writing, a
    io.BytesIO instance, or any other custom object that meets this interface.
 
 .. function:: dumps(obj[, protocol])
@@ -220,7 +220,7 @@
 
 .. exception:: PickleError
 
-   Common base class for the other pickling exceptions. It inherits
+   Common base class for the other pickling exceptions.  It inherits
    :exc:`Exception`.
 
 .. exception:: PicklingError
@@ -228,10 +228,13 @@
    Error raised when an unpicklable object is encountered by :class:`Pickler`.
    It inherits :exc:`PickleError`.
 
+   Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
+   pickled.
+
 .. exception:: UnpicklingError
 
    Error raised when there a problem unpickling an object, such as a data
-   corruption or a security violation. It inherits :exc:`PickleError`.
+   corruption or a security violation.  It inherits :exc:`PickleError`.
 
    Note that other exceptions may also be raised during unpickling, including
    (but not necessarily limited to) AttributeError, EOFError, ImportError, and
@@ -254,7 +257,7 @@
    Python needed to read the pickle produced.
 
    The *file* argument must have a write() method that accepts a single bytes
-   argument. It can thus be a file object opened for binary writing, a
+   argument.  It can thus be a file object opened for binary writing, a
    io.BytesIO instance, or any other custom object that meets this interface.
 
    .. method:: dump(obj)
@@ -276,8 +279,8 @@
 
    .. method:: clear_memo()
 
-      Deprecated.  Use the :meth:`clear` method on the :attr:`memo`.  Clear the
-      pickler's memo, useful when reusing picklers.
+      Deprecated.  Use the :meth:`clear` method on :attr:`memo`, instead.
+      Clear the pickler's memo, useful when reusing picklers.
 
    .. attribute:: fast
 
@@ -329,24 +332,28 @@
 
       Read a pickled object representation from the open file object given in
       the constructor, and return the reconstituted object hierarchy specified
-      therein. Bytes past the pickled object's representation are ignored.
+      therein.  Bytes past the pickled object's representation are ignored.
 
    .. method:: persistent_load(pid)
 
       Raise an :exc:`UnpickingError` by default.
 
       If defined, :meth:`persistent_load` should return the object specified by
-      the persistent ID *pid*. On errors, such as if an invalid persistent ID is
-      encountered, an :exc:`UnpickingError` should be raised.
+      the persistent ID *pid*.  If an invalid persistent ID is encountered, an
+      :exc:`UnpickingError` should be raised.
 
       See :ref:`pickle-persistent` for details and examples of uses.
 
    .. method:: find_class(module, name)
 
-      Import *module* if necessary and return the object called *name* from it.
-      Subclasses may override this to gain control over what type of objects can
-      be loaded, potentially reducing security risks.
+      Import *module* if necessary and return the object called *name* from it,
+      where the *module* and *name* arguments are :class:`str` objects.
+
+      Subclasses may override this to gain control over what type of objects and
+      how they can be loaded, potentially reducing security risks.
+
 
+.. _pickle-picklable:
 
 What can be pickled and unpickled?
 ----------------------------------
@@ -372,9 +379,9 @@
 
 Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
 exception; when this happens, an unspecified number of bytes may have already
-been written to the underlying file. Trying to pickle a highly recursive data
+been written to the underlying file.  Trying to pickle a highly recursive data
 structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
-raised in this case. You can carefully raise this limit with
+raised in this case.  You can carefully raise this limit with
 :func:`sys.setrecursionlimit`.
 
 Note that functions (built-in and user-defined) are pickled by "fully qualified"
@@ -390,7 +397,7 @@
 restored in the unpickling environment::
 
    class Foo:
-       attr = 'a class attr'
+       attr = 'A class attribute'
 
    picklestring = pickle.dumps(Foo)
 
@@ -571,79 +578,30 @@
 
 For the benefit of object persistence, the :mod:`pickle` module supports the
 notion of a reference to an object outside the pickled data stream.  Such
-objects are referenced by a "persistent id", which is just an arbitrary string
-of printable ASCII characters. The resolution of such names is not defined by
-the :mod:`pickle` module; it will delegate this resolution to user defined
-functions on the pickler and unpickler.
-
-To define external persistent id resolution, you need to set the
-:attr:`persistent_id` attribute of the pickler object and the
-:attr:`persistent_load` attribute of the unpickler object.
+objects are referenced by a persistent ID, which should be either a string of
+alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
+any newer protocol).
+
+The resolution of such persistent IDs is not defined by the :mod:`pickle`
+module; it will delegate this resolution to the user defined methods on the
+pickler and unpickler, :meth:`persistent_id` and :meth:`persistent_load`
+respectively.
 
 To pickle objects that have an external persistent id, the pickler must have a
-custom :func:`persistent_id` method that takes an object as an argument and
+custom :meth:`persistent_id` method that takes an object as an argument and
 returns either ``None`` or the persistent id for that object.  When ``None`` is
-returned, the pickler simply pickles the object as normal.  When a persistent id
-string is returned, the pickler will pickle that string, along with a marker so
-that the unpickler will recognize the string as a persistent id.
+returned, the pickler simply pickles the object as normal.  When a persistent ID
+string is returned, the pickler will pickle that object, along with a marker so
+that the unpickler will recognize it as a persistent ID.
 
 To unpickle external objects, the unpickler must have a custom
-:func:`persistent_load` function that takes a persistent id string and returns
-the referenced object.
-
-Here's a silly example that *might* shed more light::
-
-   import pickle
-   from io import StringIO
-
-   src = StringIO()
-   p = pickle.Pickler(src)
-
-   def persistent_id(obj):
-       if hasattr(obj, 'x'):
-           return 'the value %d' % obj.x
-       else:
-           return None
-
-   p.persistent_id = persistent_id
+:meth:`persistent_load` method that takes a persistent ID object and returns the
+referenced object.
 
-   class Integer:
-       def __init__(self, x):
-           self.x = x
-       def __str__(self):
-           return 'My name is integer %d' % self.x
+Example:
 
-   i = Integer(7)
-   print(i)
-   p.dump(i)
-
-   datastream = src.getvalue()
-   print(repr(datastream))
-   dst = StringIO(datastream)
-
-   up = pickle.Unpickler(dst)
-
-   class FancyInteger(Integer):
-       def __str__(self):
-           return 'I am the integer %d' % self.x
-
-   def persistent_load(persid):
-       if persid.startswith('the value '):
-           value = int(persid.split()[2])
-           return FancyInteger(value)
-       else:
-           raise pickle.UnpicklingError('Invalid persistent id')
-
-   up.persistent_load = persistent_load
-
-   j = up.load()
-   print(j)
-
-
-.. BAW: pickle supports something called inst_persistent_id()
-   which appears to give unknown types a second shot at producing a persistent
-   id.  Since Jim Fulton can't remember why it was added or what it's for, I'm
-   leaving it undocumented.
+.. highlightlang:: python
+.. literalinclude:: ../includes/dbpickle.py
 
 
 .. _pickle-sub:
@@ -808,5 +766,10 @@
 
 .. [#] These methods can also be used to implement copying class instances.
 
-.. [#] This protocol is also used by the shallow and deep copying operations defined in
-   the :mod:`copy` module.
+.. [#] This protocol is also used by the shallow and deep copying operations
+   defined in the :mod:`copy` module.
+
+.. [#] The limitation on alphanumeric characters is due to the fact the
+   persistent IDs, in protocol 0, are delimited by the newline character.
+   Therefore if any kind of newline characters, such as \r and \n, occurs in
+   persistent IDs, the resulting pickle will become unreadable.


More information about the Python-3000-checkins mailing list