Re: [Mailman-Developers] [Bug 985149] Add List-Post value to permalink hash input
On Tue, Apr 24, 2012 at 1:40 PM, Jeff Breidenbach <jeff@jab.org> wrote:
Is 4 bytes too short?
Four characters is only about a million combinations. First collision is 50% likely at 1200 messages, and multi-million message databases are completely screwed.
If we're willing to impose disambiguation on the user (and ability to find and report all matching messages on the UI), then the questions to me would be
- Assume a 10 million message archive.
- What percentage of permalinks need another click?
- What percentage of permalinks will result in a list of more than 10 matches?
Rationale for 0: 10 related lists X 20 years X 365 days X 100 messages/day. I can imagine people wanting to index into such a corpus. Rationale for 1: Obvious, I hope. Rationale for 2: Maybe I'm just getting old, but that's the number of lines I can comfortably scan in a glance. FVO of "10" that suit you, I guess.
Note that, like Barry, I'm assuming disambiguation will be needed for x-posts in any case. WDOT?
Bottom line: how big a database do we expect to have, and amongst those messages, how many collisions are considered acceptable?
-Jeff
PS. These numbers assume a well balanced hash. This paper suggests SHA-1 is pretty good in non-adversarial situations, but I'm not an expert. http://cseweb.ucsd.edu/~mihir/papers/balance.html
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/stephen%40xemacs.o...
Security Policy: http://wiki.list.org/x/QIA9
participants (1)
-
Stephen J. Turnbull