[GSOC 2014]Approach towards the Full anonymization project
Rajeev S writes:
As mentioned, here is my approach towards the full anonymization project.
AFAICS as far as described it will provide the outcomes you describe.
However, I don't understand the use case here. Most approaches use a single secret ID for each user. This is not just a matter of convenience for the developer, but a requirement in some cases. That is, the list members, although anonymous "in the real world", build trust relationships with each other in the list environment. "Full anonymity" is in any case difficult to achieve, as word choice, topic choice, grammatical construction, time of writing, etc fall into patterns over time.
Also, given your model of address-per-post, I'm again unclear on the use-case for off-list communication via the list server.
Finally, there are a number of sets of details you don't mention here but need to be discussed in your plan (even if you don't propose to implement them now, you need to ensure that you don't make it difficult to implement them!) First, there's a question of how the proposed off-list messaging is going to be handled. Those pseudo- random addresses are going to need to be made valid addresses to the host's MTA. That's MTA-dependent, and also will likely have security implications. To be sure that people don't inadvertantly reveal their identities just by hitting "R" those messages should be anonymized as well, so that any such replies have to go through the server too. Of course it's going to be impossible to prevent people from exchanging email addresses in the body of the text, but in that case it's really not your problem any more.
The second is cleaning up the rest of a post. The incoming trace headers typically identify the sender quite precisely. Quite likely you'll want to nuke everything that isn't required by the RFCs, in fact. You probably also should try to do something about .sigs, Message-ID (which is required), Date (also required, and which often gives timezone information) and other automatically added text.
Third, what about authentication for incoming posts? Do you care if people spoof addresses? I'm not sure this has any meaning in the one-shot address environment you propose, but that again is going to depend on the use case.
Fourth, you need to think about security for the encryption key and EmailMapper table, as well as any archives (you need to clean up archived posts before they go to the archive -- this is probably just a matter of where your Handler goes in the pipeline).
- Introduce a new model EmailMapper with attributes
- ForeginKey to Address / User
- seed, A 40 bit hash,unique
- nuses, number of times this hash is used,max 5 or 10
- The approach is to encrypt the seed nuses times, with encryption algorithms like AES, each time the email ID is displayed.
- The email ID is displayed as <nuses><encrypted seed>
- The email is decrypted nuses times to find the parent seed and thereby point to the exact email address.
- A new seed should be generated for the user after a fixed number of attempts,say 5 or 10,as the repeated encryption routines can slow down the system.
The outcomes
- Everytime the user sends a message,his from address changes.
- At the same time, each of the from addresses point to the same user.
- The sender can use any stored address he has,like in the mail contacts,repeatedly, to contact with a user,as it has nuses attached with it.
So you need to store the addresses forever. How big might these tables grow? Could that be a problem?
Did you consider using the "seed" as a "salt" instead? Ie, regenerating the seed each time, adjoining it to the address, and encrypting the combination? That would allow you not store a database of addresses. Of course if the encryption were compromised, all the old posts could be identified, whereas in your scheme the EmailMapper table also needs to be compromised to get addresses.
Regards, and good luck with your proposal!
participants (1)
-
Stephen J. Turnbull