Re: [Mailman-Developers] Requirements for a new archiver
On Wed, 29 Oct 2003 18:41:20 +0100 Brad Knowles <brad.knowles@skynet.be> wrote:
At 11:48 AM -0500 2003/10/29, J C Lawrence wrote:
You need a guaranteed unique id to be used as a primary index field. "Need" is a strong word. Its very deployment and use-case sensitive.
In the case of a database, it is a hard requirement. A primary index field must be guaranteed unique. There is absolutely no way around this issue.
Right, and I'm not arguing that. My point is two fold:
Using Message ID as a primary key is attractive.
Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases.
We don't have to guarantee key uniqueness for all messages BEFORE they are submitted to the message store. The unique property can be assumed from external sources (with all that implies) should the deployment case want that. There are tradeoffs here, and it is not clear to me that there is an instant and obvious global solution.
"Need"? No. It is a deployment choice with easily understood ramifications.
Perhaps for the application, but this is a totally different ballgame when it comes to a database. Google for "primary index field", and hopefully you will understand.
I'm neither an idiot or a neophyte in this game. Yes, a database needs a primary unique key. That's not in debate. The questions are:
Do we know the key before submission to the store?
(If we don't the store operation shouldn't be asynchronous)
Is the risk of discarded messages due to key collisions acceptable?
(Some deployment cases consider such losses acceptable, others can
guarantee uniqueness without Mailman's involvement)
Rotely assuming that Mailman must guarantee key uniqueness before we hit the message store is not a given, its a choice.
Let's at least be on the same page.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw@kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
At 1:30 PM -0500 2003/10/29, J C Lawrence wrote:
Right, and I'm not arguing that. My point is two fold:
- Using Message ID as a primary key is attractive.
Agreed.
- Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases.
Outside of a database, this may be something you can decide
whether or not to live with. Within the confines of a database, this simply is not possible.
The ANSI SQL specification has some hard requirements for a
primary index key:
1. It cannot ever be null.
2. It must always be guaranteed unique.
I'm sure there are other requirements. But these two are a good start.
We don't have to guarantee key uniqueness for all messages BEFORE they are submitted to the message store.
All other keys could potentially be non-unique, or null, but not
the primary index key. This is why many applications have the database assign the primary index key itself on insertion into the table, so that all the necessary requirements can be met.
I'm neither an idiot or a neophyte in this game. Yes, a database needs a primary unique key.
Then you must realize that we could not possibly use message-id
as the primary index key, unless this is a field that we generate ourselves in such a way that all the necessary requirements are met.
Rotely assuming that Mailman must guarantee key uniqueness before we hit the message store is not a given, its a choice.
The message-id is not necessarily the primary index key. See above.
With regards to a primary index key, there simply is no choice.
The message-id could continue to be one of the many secondary index keys, which is a totally different issue.
Let's at least be on the same page.
Agreed.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
On Wed, 2003-10-29 at 13:30, J C Lawrence wrote:
- Message IDs are not guaranteed globally unique, but the collision rate can be manageable/acceptable in a large number of deployment cases.
Ah, which reminds me, elaborating on my strawman, the answers to "when should Mailman rewrite Message-ID on posts" should be: Never, Only to resolve duplicates, Always.
-Barry
participants (3)
-
Barry Warsaw
-
Brad Knowles
-
J C Lawrence