[GSOC 2014]Approach towards the Full anonymization project

Rajeev S writes:
As mentioned, here is my approach towards the full anonymization project.
AFAICS as far as described it will provide the outcomes you describe.
However, I don't understand the use case here. Most approaches use a single secret ID for each user. This is not just a matter of convenience for the developer, but a requirement in some cases. That is, the list members, although anonymous "in the real world", build trust relationships with each other in the list environment. "Full anonymity" is in any case difficult to achieve, as word choice, topic choice, grammatical construction, time of writing, etc fall into patterns over time.
Also, given your model of address-per-post, I'm again unclear on the use-case for off-list communication via the list server.
Finally, there are a number of sets of details you don't mention here but need to be discussed in your plan (even if you don't propose to implement them now, you need to ensure that you don't make it difficult to implement them!) First, there's a question of how the proposed off-list messaging is going to be handled. Those pseudo- random addresses are going to need to be made valid addresses to the host's MTA. That's MTA-dependent, and also will likely have security implications. To be sure that people don't inadvertantly reveal their identities just by hitting "R" those messages should be anonymized as well, so that any such replies have to go through the server too. Of course it's going to be impossible to prevent people from exchanging email addresses in the body of the text, but in that case it's really not your problem any more.
The second is cleaning up the rest of a post. The incoming trace headers typically identify the sender quite precisely. Quite likely you'll want to nuke everything that isn't required by the RFCs, in fact. You probably also should try to do something about .sigs, Message-ID (which is required), Date (also required, and which often gives timezone information) and other automatically added text.
Third, what about authentication for incoming posts? Do you care if people spoof addresses? I'm not sure this has any meaning in the one-shot address environment you propose, but that again is going to depend on the use case.
Fourth, you need to think about security for the encryption key and EmailMapper table, as well as any archives (you need to clean up archived posts before they go to the archive -- this is probably just a matter of where your Handler goes in the pipeline).
So you need to store the addresses forever. How big might these tables grow? Could that be a problem?
Did you consider using the "seed" as a "salt" instead? Ie, regenerating the seed each time, adjoining it to the address, and encrypting the combination? That would allow you not store a database of addresses. Of course if the encryption were compromised, all the old posts could be identified, whereas in your scheme the EmailMapper table also needs to be compromised to get addresses.
Regards, and good luck with your proposal!
participants (1)
-
Stephen J. Turnbull