<div class="gmail_quote">On Sat, Dec 31, 2011 at 4:04 PM, Jeffrey Yasskin <span dir="ltr"><<a href="mailto:jyasskin@gmail.com">jyasskin@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">Hash functions are already unstable across Python versions. Making</div>
them unstable across interpreter processes (multiprocessing doesn't<br>
share dicts, right?) doesn't sound like a big additional problem.<br>
Users who want a distributed hash table will need to pull their own<br>
hash function out of hashlib or re-implement a non-cryptographic hash<br>
instead of using the built-in one, but they probably need to do that<br>
already to allow themselves to upgrade Python.<br></blockquote><div><br></div><div>Here's an idea. Suppose we add a sys.hash_seed or some such, that's settable to an int, and defaults to whatever we're using now. Then programs that want a fix can just set it to a random number, and on Python versions that support it, it takes effect. Everywhere else it's a silent no-op.</div>
<div><br></div><div>Downside: sys has to have slots for this to work; does sys actually have slots? My memory's hazy on that. I guess actually it'd have to be sys.set_hash_seed(). But same basic idea.</div><div>
<br></div><div>Anyway, this would make fixing the problem *possible*, while still pushing off the hard decisions to the app/framework developers. ;-)</div><div><br></div><div>Downside: every hash operation includes one extra memory access, but strings only compute their hash once anyway.)</div>
<div><br></div><div>Given that changing dict won't help, and changing the default hash is a non-starter, an option to set the seed is probably the way to go. (Maybe with an environment variable and/or command line option so users can work around old code.)</div>
<div><br></div></div>