[ python-Bugs-1536021 ] hash(method) sometimes raises OverflowError

SourceForge.net noreply at sourceforge.net
Wed Aug 9 11:47:56 CEST 2006


Bugs item #1536021, was opened at 2006-08-07 14:21
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1536021&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Documentation
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Christian Tanzer (tanzer)
>Assigned to: A.M. Kuchling (akuchling)
Summary: hash(method) sometimes raises OverflowError

Initial Comment:
I've run into a problem with a big application that I
wasn't able to
reproduce with a small example.

The code (exception handler added to demonstrate and
work around the
problem): 

            try :
                h = hash(p)
            except OverflowError, e:
                print type(p), p, id(p), e
                h = id(p) & 0x0FFFFFFF

prints the following output:

<type 'instancemethod'>
    <bound method Script_Category.is_applicable of
       <Script_Menu_Mgr.Script_Category object at
0xb6cb4f8c>>
       3066797028 long int too large to convert to int

This happens with Python 2.5b3, but didn't happen with
Python 2.4.3.

I assume that the hash-function for function/methods
returns the `id`
of the function. The following code demonstrates the
same problem with
a Python class whose `__hash__` returns the `id` of the
object:

$ python2.4
    Python 2.4.3 (#1, Jun 30 2006, 10:02:59) 
    [GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,
pie-8.7.9)] on linux2
    Type "help", "copyright", "credits" or "license"
for more information.
    >>> class X(object):
    ...   def __hash__(self): return id(self)
    ... 
    >>> hash (X())
    -1211078036
$ python2.5 
    Python 2.5b3 (r25b3:51041, Aug  7 2006, 15:35:35) 
    [GCC 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,
pie-8.7.9)] on linux2
    Type "help", "copyright", "credits" or "license"
for more information.
    >>> class X(object):
    ...   def __hash__(self): return id(self)
    ... 
    >>> hash (X())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: long int too large to convert to int



----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2006-08-09 09:47

Message:
Logged In: YES 
user_id=849994

Andrew, do you want to add a whatsnew entry?

----------------------------------------------------------------------

Comment By: Christian Tanzer (tanzer)
Date: 2006-08-09 09:24

Message:
Logged In: YES 
user_id=2402

> The only thing I could imagine is that the Script_Category 
> class has a custom __hash__() method which returns a value 
> that is sometimes a long, as it would be if it were 
> based on id(). 

That was indeed the problem in my code (returning `id(self)`).

> It has always been documented that just returning id() 
> in custom __hash__() methods doesn't work because of
> this 

AFAIR, it was once documented that the default hash value is
the id of an object. And I just found a message by the BFDL
himself proclaiming so:
http://python.project.cwi.nl/search/hypermail/python-recent/0168.html.

OTOH, I don't remember seeing anything about this in AMK's
`What's new in Python 2.x` documents (but found an entry in
NEWS.txt for some 2.5 alpha).

I've now changed all my broken `__hash__` methods (not that
many fortunately) but it might be a good idea to document
this change in a more visible way.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-08-09 07:58

Message:
Logged In: YES 
user_id=21627

Thanks for the report. Fixed in r51160

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-08-08 10:25

Message:
Logged In: YES 
user_id=4771

The hash of instance methods changed, and id() changed
to return non-negative numbers (so that id() is not
the default hash any more).  But I cannot see how your
problem shows up.  The only thing I could imagine is
that the Script_Category class has a custom __hash__()
method which returns a value that is sometimes a long,
as it would be if it were based on id().  (It has
always been documented that just returning id() in
custom __hash__() methods doesn't work because of
this, but on 32-bit machines the problem only became
apparent with the change in id() in Python 2.5.)

----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2006-08-07 15:03

Message:
Logged In: YES 
user_id=1038590

MvL diagnosed the problem on python-dev as being due to
id(obj) now always returning positive values (which may
sometimes be a long).

This seems like sufficient justification to change the
hashing implementation to tolerate long values being
returned from __hash__ methods (e.g. by using the hash of a
returned long value, instead of trying to convert it to a C
int directly).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1536021&group_id=5470


More information about the Python-bugs-list mailing list