[ python-Bugs-956303 ] Potential data corruption bug in save_pers in pickle module.

SourceForge.net noreply at sourceforge.net
Tue May 18 18:51:30 EDT 2004


Bugs item #956303, was opened at 2004-05-18 22:45
Message generated for change (Settings changed) made by amc1
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=956303&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
>Priority: 7
Submitted By: Allan Crooks (amc1)
Assigned to: Nobody/Anonymous (nobody)
Summary: Potential data corruption bug in save_pers in pickle module.

Initial Comment:
There is a bug in save_pers in both the pickle and
cPickle modules in Python.

It occurs when someone uses a Pickler instance which is
using an ASCII protocol and also has persistent_id
defined which can return a persistent reference that
can contain newline characters in.

The current implementation of save_pers in the pickle
module is as follows:

----

   def save_pers(self, pid):
        # Save a persistent id reference
        if self.bin:
            self.save(pid)
            self.write(BINPERSID)
        else:
            self.write(PERSID + str(pid) + '\n')

----

The else clause assumes that the 'pid' will not be a
string which one or more newline characters.

If the pickler pickles a persistent ID which has a
newline in it, then an unpickler with a corresponding
persistent_load method will incorrectly unpickle the
data - usually interpreting the character after the
newline as a marker indicating what type of data should
be expected (usually resulting in an exception being
raised when the remaining data is not in the format
expected).

I have attached an example file which illustrates in
what circumstances the error occurs.

Workarounds for this bug are:
  1) Use binary mode for picklers.
  2) Modify subclass implementations of save_pers to
ensure that newlines are not returned for persistent ID's.

Although you may assume in general that this bug would
only occur on rare occasions (due to the unlikely
situation where someone would implement persistent_id
so that it would return a string with a newline
character embedded), it may occur more frequently if
the subclass implementation of persistent_id uses a
string which has been constructed using the marshal module.

This bug was discovered when our code implemented the
persistent_id method, which was returning the
marshalled format of a tuple which contained strings.
It occurred when one or more of the strings had a
length of ten characters - the marshalled format of
that string contains the string's length, where the
byte used to represent the number 10 is the same as the
one which represents the newline character:

>>> marshal.dumps('a' * 10)
's\n\x00\x00\x00aaaaaaaaaa'
>>> chr(10)
'\n'

I have replicated this bug on Python 1.5.2 and Python
2.3b1, and I believe it is present on all 2.x versions
of Python.

Many thanks to SourceForge user (and fellow colleague)
SMST who diagnosed the bug and provided the test cases
attached.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=956303&group_id=5470



More information about the Python-bugs-list mailing list