[ python-Bugs-956303 ] Potential data corruption bug in save_pers in pickle module.

SourceForge.net noreply at sourceforge.net
Tue May 18 23:02:48 EDT 2004


Bugs item #956303, was opened at 2004-05-18 18:45
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=956303&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 7
Submitted By: Allan Crooks (amc1)
Assigned to: Nobody/Anonymous (nobody)
Summary: Potential data corruption bug in save_pers in pickle module.

Initial Comment:
There is a bug in save_pers in both the pickle and
cPickle modules in Python.

It occurs when someone uses a Pickler instance which is
using an ASCII protocol and also has persistent_id
defined which can return a persistent reference that
can contain newline characters in.

The current implementation of save_pers in the pickle
module is as follows:

----

   def save_pers(self, pid):
        # Save a persistent id reference
        if self.bin:
            self.save(pid)
            self.write(BINPERSID)
        else:
            self.write(PERSID + str(pid) + '\n')

----

The else clause assumes that the 'pid' will not be a
string which one or more newline characters.

If the pickler pickles a persistent ID which has a
newline in it, then an unpickler with a corresponding
persistent_load method will incorrectly unpickle the
data - usually interpreting the character after the
newline as a marker indicating what type of data should
be expected (usually resulting in an exception being
raised when the remaining data is not in the format
expected).

I have attached an example file which illustrates in
what circumstances the error occurs.

Workarounds for this bug are:
  1) Use binary mode for picklers.
  2) Modify subclass implementations of save_pers to
ensure that newlines are not returned for persistent ID's.

Although you may assume in general that this bug would
only occur on rare occasions (due to the unlikely
situation where someone would implement persistent_id
so that it would return a string with a newline
character embedded), it may occur more frequently if
the subclass implementation of persistent_id uses a
string which has been constructed using the marshal module.

This bug was discovered when our code implemented the
persistent_id method, which was returning the
marshalled format of a tuple which contained strings.
It occurred when one or more of the strings had a
length of ten characters - the marshalled format of
that string contains the string's length, where the
byte used to represent the number 10 is the same as the
one which represents the newline character:

>>> marshal.dumps('a' * 10)
's\n\x00\x00\x00aaaaaaaaaa'
>>> chr(10)
'\n'

I have replicated this bug on Python 1.5.2 and Python
2.3b1, and I believe it is present on all 2.x versions
of Python.

Many thanks to SourceForge user (and fellow colleague)
SMST who diagnosed the bug and provided the test cases
attached.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2004-05-18 23:02

Message:
Logged In: YES 
user_id=31435

The only documentation is the "Pickling and unpickling 
external objects" section of the Library Reference Manual, 
which says:

"""
Such objects are referenced by a ``persistent id'', which is 
just an arbitrary string of printable ASCII characters.
"""

A newline is universally considered to be a control character, 
not a printable character (e.g., try isprint('\n') under your 
local C compiler).  So this is functioning as designed and as 
documented.  If you don't find the docs clear, we should call 
this a documentation bug.  If you think the semantics should 
change to allow more than printable characters, then this 
should become a feature request, and more is needed to 
define exactly which characters should be allowed.  The 
current implementation is correct for persistent ids that meet 
the documented requirement.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=956303&group_id=5470



More information about the Python-bugs-list mailing list