We can use GIFs, but we cant use the LZW compression option. As the GIFs we produce will be small, this is hardly a problem. At any rate, any of GIF, PNG or JPEG will do.
The advantage of using encryption rather than hashing is that the email renderer needs to know nothing about the system except what it is given in the querystring (and the key, of course). Using md5 hashes would be more secure, but would require more information to be shared between the pipermail cgi and the renderemail cgi; that is, a mapping of hashes to email addresses. Frankly, I cant see email harvesters going to the trouble of cracking encryption - no matter what kind.
I can think of an alterative technique for preventing email harvesting; one that preseves the clickability of emaila addresses. It involves the use of javascript. Using it, an email link would look something like this:
<script language="javascript"> email = decode("92731602eba1aa4f506604f8c3671ed83ea9"); document.write("<a href='mailto:"+email+"'>"+email+"</a>"); </script>
decode() is some kind of decrypting function. Possibly something as simple as a substitution cipher.
The disadvantage of this technique is that email addresses wouldn't be accessible to people using a non-javascript capable browser, and browser variations would tend to make this less reliable than the image viewing technique. For example, I have found that Netscape is less than reliable when it comes to document.write().
One solution to the non-javascript browser issues would be to use the <noscript> tag to deliver the email address as an image.
<noscript> <img src="/render-email.py?92731602eba1aa4f506604f8c3671ed83ea9"> </noscript>
One problem with all this is that pipermail renders HTML as it stores emails, rather than as they are viewed. The issue here is that the original emails are lost; if we encrypt (or hash) email addresses, then losing the key (or hash->email mapping) implies also losing the addresses. There are some comments in the pipermail/HyperMail source code about rendering to html on viewing rather than storage, but converting to this scheme would imply a backwards compatability issue: how to import emails alerady rendered to html.
A backwards compatable solution would be to render to html twice - once on storage and again on viewing. The viewing renderer would be responsible for encrypting/obfuscating email addresses.
Im going to join the mailman-developers list with a temporary email address. Get some more input on this issue.
-----Original Message----- From: Barry A. Warsaw [mailto:barry@zope.com] Sent: Wednesday, 30 January 2002 21:18 To: Damien Morton Subject: Re: Mailman developer?
"DM" == Damien Morton <Damien.Morton@acm.org> writes:
DM> Im under the impression that you are one of the main
DM> developers and/or maintainers of Mailman. I hope you don't
DM> mind me writing to you.
Nope.
DM> I notice that Mailman obfuscates email addresses to a
DM> certain extent, but replacing the @ symbol with %40 or
DM> &atmark; is hardly sufficient. An even vaguely intelligent
DM> email harvester will see through this.
True.
DM> The feature im proposing is to render out all email
DM> addresses in the archive as GIFs. I would have pipermail
DM> render out <img> tags whose src is an encrypted version of the
DM> email address. A companion CGI script would decrypt the email
DM> address and render it out as a GIF image using PIL or
DM> somesuch.
Of course, because Mailman is a GNU project, we can't use gifs, but pngs or jpegs would work just as well. IIRC, PIL can generate either of those formats.
| Instead of rendering this:
| <a href="mailto:your.email@address">your.email@address</a>
DM> You'd render this instead: <img
DM> src="/render-email.py?92731602eba1aa4f506604f8c3671ed83ea9">
DM> This exmaple uses the simple rotor encryption that
DM> comes with python and the key is "the quick brown fox jumped
DM> over the lazy dog"
I usually use the md5 module to generate unique keys in such situations.
DM> Ive been looking at pipermail, and it is
DM> _ugly_.
Tell me about it! About it's only saving grace is that it's reasonably well integrated and it's all in Python. Other than that... ;)
There have been lots of discussion over the years about ditching that code and doing it right, so I'd suggest pouring over the mailman-developers archives (yes, you'll miss being able to search it ;). It's a lot of work though, so it currently languishes for lack of a motivated champion.
DM> Nonetheless, I'm fairly sure I can add this functionality
DM> easily. Im running w2k, however, and I see that Mailman isnt
DM> really meant for w2k.
Correct.
DM> As I would be working on the pipermail part of mailman only,
DM> it might be easier to get only that component up and running
DM> under widnows. Not sure if theres a sample archive that comes
DM> with mailman, but... any suggestions welcome.
Nope, but you can of course grab the raw mbox for any archive, or for a month of messages. Note that that does point to another source of leaked addresses, one that won't be directly affected by your idea. However I could see hiding raw mbox access behind a cgi POST which should effectively stop today's harvesters.
DM> The downsides of this functionality are that it might
DM> incur a performance penalty and that it eliminates the
DM> clickable mailto: functionality persently there.
System-wide caching should alleviate the performance hit (modulo cgi overhead). The loss of clickable mailto: would be a drag, although I don't know how much it would be missed in practice.
DM> As far as functionality goes, I imagine that the bulk
DM> of any mailman bandwidth will be from spiders, and these are
DM> unlikely to traverse an image source link. Secondly, simple
DM> caching should be very easy to implement.
DM> As far as the clickable mailto functionality goes, I
DM> have two suggestions. The first is that the
DM> render-emails-as-images functionality could be a personal
DM> preference of the sender of the email, and the second is that
DM> that preference could be overridden by acquiring a cookie
DM> through some bot and spider proof mechanism. I like the 'type
DM> what you read in the image above' mechanism for detecting
DM> humans.
I wouldn't make it a option of the sender, but of the list (or maybe just of the site).
Anyway, it's a neat idea. Mailman-developers would be the best place to discuss it, but that does present a bit of a catch 22 for you. ;)
From a practical standpoint, MM2.1 will likely go to beta this weekend, meaning feature freeze. However I encourage you to follow through and work out some patches, if you'd be willing to assign copyright to them to the FSF eventually. Post any patches on Mailman's SF project page and that will let other interested parties download it and test it out, etc.
Cheers, -Barry