[Spambayes] sharing split database

bill parducci bill at parducci.net
Tue May 20 12:04:03 EDT 2003


Tim Peters wrote:
> Note that people who shop online have all sorts of identifying info in their
> email, including account numbers, passwords, email addresses, phone numbers,
> mailing addresses, billing addresses, shipping addresses, birth dates,
> Social Security number, mother's maiden name, and even credit card numbers
> and expiration dates echoed back by clueless online merchants.  

> Besides
> poke-and-hope attacks, someone who has access to a shared database could
> easily learn a lot by computing deltas across incremental training; the new
> tokens that show up are likely to be correlated.  Anyone thinking of sharing
> a database has to be acutely aware of the risks to privacy.
> 

anything beyond 'poke & hope' would require general query capabilities. 
if weighting is maintained locally any call to the shared repository 
would be of the form 'index = stoken(token)'. if the token doesn't exist 
in the shared db, it is inserted and a new index is returned. a 
determined hacker could figure out if a token already exists by 
comparing index numbers *if the system used serial indexing* (which 
could be foiled by using a hashed index) but beyond that, only the 
existence of the token could be derived.

is that valuable? sure, but if you have enough information to poke & 
hope meaningfully in a large community, then you likely have enough 
information to take action directly.

that said, such design constraints make the shared db idea questionable 
WRT size savings. (as pointed out earlier, but it took me a while to 
fully grasp :o)

b




More information about the Spambayes mailing list