[Mailman-Developers] Interesting study -- spam on postedaddresses...

Damien Morton dm-temp-310102@nyc.rr.com
Thu, 21 Feb 2002 08:28:13 -0500

> From: Dale Newfield [mailto:dale@newfield.org] 
> On Wed, 20 Feb 2002, Damien Morton wrote:
> > I still think the email-address-as-jpeg solution is prohibitively 
> > expensive to reverse; effectively impossible for machines, entirely 
> > easy for people.
> But it does have drawbacks.
> It only works with graphical browsers.

This is true. We are in the 21st century now. Expecting a graphical
client isnt such a huge leap of faith, unless we allow ourselves to be
guided by recidivist or luddite lynx users and their ilk.

> It can't be enlarged for people that have poor vision.

This is true, for the public archives.

> It can be reverse-engineered -- all they have to do is decode 
> a single font, then they're all simple to snag.

Assuming you use a single font.
Assuming you don't add some noise to the resulting image.
Assuming you don't do some geometric distortion to the resulting image.

To reverse engineer, a harvester would have to examine pretty much every
image it finds, OCR it with some fantastic military grade image
recognition software, and see if theres an email address buried in

As I said, "prohibitively expensive to reverse"

> In fact, as someone with lots of computer graphics 
> experience, I'd say it would be almost no harder to write 
> code to decode them than it would be to write code to generate them.

As someone with lots of computer graphics experience, you will probably
know that OCR is hard. Its even harder with a non-cooperating document,
hidden amongst many other documents.

> > Web Forms for contacting the admin cold. If the admin 
> replies, you can 
> > continue the conversation via email.
> Right, assuming the web form doesn't break.

In my experience, the mostly likely route to a web form breaking is if
the email address it sends to breaks.

> > Private and Public views of the archives.
> >
> > Private archives are restricted to list members and those that can 
> > pass a reverse turing test.
> People keep using this term, but I'm not sure what they mean, 
> or if I trust that they'd be so reliable...

Some examples of reverse turing tests. (http://www.captcha.net/)
http://drive.to/research (this one uses audio)

Any of those tests can be implemented in Python using PIL.

Between an audio test and a visual test, you've got the blind and the
deaf covered.

> > Public archives render all email addresses as jpegs.
> If they're automatically generated, it'd be easier to create 
> pngs or gifs, or lots of other formats than jpgs.  Think 
> about this, though--how do you actually generate the images 
> and serve them properly without either including the email 
> address in the html code anyway (so the img request specifies 
> what image to generate), or building a whole database mapping 
> arbitrary numbers to email addresses (so they can either be 
> generated on the fly or stored pre-generated).  Once you've 
> got that database, why not just have that database front a 
> web form instead of displaying the address?

I suggested JPEGs because they are computationally more expensive to
decode than other formats. Also: compression is lossy and adds a certain
amount of noise to an image.

Generating and serving the images would be done as follows:

filename =
md5.new('list-specific-salt-string'+'email@server.com').hexdigest() +
if not exists(filename):
  img = render_email('email@server.com')

Then you relace every occurrence of 'email@server.com' with '<img
src="%s">' % filename

Replacing the email addresses with a link to a webform would be another,
perfectly acceptable solution, assuming you can get over your own
objections to web forms.