
On Tuesday, 16 August 2011 15:27:30 Armin Rigo wrote:
Hi David,
On Mon, Aug 15, 2011 at 6:20 PM, David Naylor <naylor.b.david@gmail.com> wrote:
For me the performance of datetime object's hashing is sufficient but I think the python code could use some performance improvements. Is my approach using a direct computation to type long acceptable (in principle). If so I can refine it and submit a patch.
Yes, replacing the hash with a faster-to-compute one is fine. It's best performance-wise if you can avoid using Python longs. As far as I know it just needs some random-looking xor-ing and shifting of the fields. Note, of course, that you must carefully satisfy the property that for any objects x and y, if "x == y" then "hash(x) == hash(y)".
Below is the patch, and results, for my proposed hash methods for datetime.datetime (and easily adaptable to include tzinfo and the other datetime objects). I tried to make the hash safe for both 32bit and 64bit systems, and beyond. The results are: # python datetest.py (datetime.py) hash_unity: 35.83 seconds hash_unity: 44.60 seconds hash_datetime: 65.58 seconds hash_datetime: 53.95 seconds # python datetest.py hash_unity: 5.70 seconds hash_unity: 5.69 seconds hash_datetime: 4.88 seconds hash_datetime: 4.90 seconds # pypy datetest.py hash_unity: 0.74 seconds hash_unity: 0.63 seconds hash_datetime: 11.74 seconds hash_datetime: 11.47 seconds # pypy datetest.py (patched datetime.py) hash_unity: 0.73 seconds hash_unity: 0.62 seconds hash_datetime: 0.76 seconds hash_datetime: 0.64 seconds So, based on my patch there is a 7.7x improvement over python and a 17.9x improvement over the previous pypy implementation. If the above approach is acceptable I will complete the patch? Regards