Hi there. I was made aware of this oddity here: import cPickle reffed = "xKITTENSx"[1:-1] print repr(cPickle.dumps(reffed)) print repr(cPickle.dumps("xKITTENSx"[1:-1])) These strings are different, presumably because of the (ob_refcnt == 1) optimization used during object pickling. This might come as a suprise to many, and is potentially dangerous if pickling is being used as precursor to some hashing function. For example, we use code that caches function calls, using something akin to: myhash = hash(cPickle.dumps(arguments)) try: cached_args, cached_value = cache[myhash] if cached_args == arguments: return cached_value except KeyError: value = Function(*args) cache[myhash] = artuments, value return value The non-uniqueness of the pickle string will cause unnecessary cache misses in this code. Pickling is useful as a precusor because it allows for more varied object types than hash() alone would. I just wanted to point this out. We'll attempt some local workarounds here, but it should otherwise be simple to modify pickling to optionally turn off this optimization and always generate the same output irrespective of the internal reference counts of the objects. Cheers, Kristján
I just wanted to point this out. We‘ll attempt some local workarounds here, but it should otherwise be simple to modify pickling to optionally turn off this optimization and always generate the same output irrespective of the internal reference counts of the objects.
I think there are many other instances where values that compare equal pickle differently in Python. By relying on this property for hashing, you are really operating on thin ice. Regards, Martin
2010/8/3 "Martin v. Löwis" <martin@v.loewis.de>: ..
I think there are many other instances where values that compare equal pickle differently in Python.
Indeed. For example:
1.0 == 1 True dumps(1.0) == dumps(1) False
or for objects of the same type
0.0 == -0.0 True dumps(0.0) == dumps(-0.0) False
-----Original Message----- From: "Martin v. Löwis" [mailto:martin@v.loewis.de] Sent: 3. ágúst 2010 20:48 To: Kristján Valur Jónsson Cc: Python-Dev Subject: Re: [Python-Dev] pickle output not unique
I just wanted to point this out. We'll attempt some local workarounds here, but it should otherwise be simple to modify pickling to optionally turn off this optimization and always generate the same output irrespective of the internal reference counts of the objects.
I think there are many other instances where values that compare equal pickle differently in Python. By relying on this property for hashing, you are really operating on thin ice.
Well, it is not _that_ dangerous. It just causes cache misses when they wouldn't be expected. But since this has been brought up and dismissed in issue 8738, I won't pursue this further. K
2010/8/4 Kristján Valur Jónsson <kristjan@ccpgames.com>: ..
Well, it is not _that_ dangerous. It just causes cache misses when they wouldn't be expected. But since this has been brought up and dismissed in issue 8738, I won't pursue this further.
Don't read too much from the "dismissal" of issue 8738. I will be happy to reopen it if there are ideas on how to improve the situation. For example, it may be helpful to document whether the situation improved in 3.x and whether pickletools.optimize produces more consistent pickles. If you change your mind, please leave a note on the issue.
2010/8/3 Kristján Valur Jónsson <kristjan@ccpgames.com>: ..
These strings are different, presumably because of the (ob_refcnt == 1) optimization used during object pickling.
I have recently closed a similar issue because it is not a bug and the problem is not present in 3.x: http://bugs.python.org/issue8738 ..
I just wanted to point this out. We‘ll attempt some local workarounds here, but it should otherwise be simple to modify pickling to optionally turn off this optimization and always generate the same output irrespective of the internal reference counts of the objects.
I wonder if it would help if rather than trying to turn off the ad-hoc optimization, you run your pickle strings through pickletools.optimize.
participants (3)
-
"Martin v. Löwis" -
Alexander Belopolsky -
Kristján Valur Jónsson