[Tutor] check against multiple variables

Steven D'Aprano steve at pearwood.info
Fri Jul 20 00:20:32 CEST 2012


Selby Rowley-Cannon wrote:
> I am using a hash table in a small randomization program. I know that 
> some hash functions can be prone to collisions, so I need a way to 
> detect collisions.

I doubt that very much.

This entire question seems like a remarkable case of premature optimization. 
Start with demonstrating that collisions are an actual problem that need fixing.

Unless you have profiled your application and proven that hash collisions is a 
real problem -- and unless you are hashing thousands of float NANs, that is 
almost certainly not the case -- you are just wasting your time and making 
your code slower rather than faster -- a pessimation, not optimization.

And if it *is* a problem, then the solution is to fix your data so that its 
__hash__ method is less likely to collide. If you are rolling your own hash 
method, instead of using one of Python's, that's your first problem.

Python's hash implementation is one of the most finely tuned in the world. 
Many, many years of effort have gone into making it stand up to real-world 
data. You aren't going to beat it with some half-planned pure-Python work-around.



-- 
Steven


More information about the Tutor mailing list