[Mailman-Developers] Interesting study -- spam on postedaddresses...

Damien Morton dm-temp-310102@nyc.rr.com
Thu, 21 Feb 2002 12:05:53 -0500

> From: Dale Newfield
> On Thu, 21 Feb 2002, Damien Morton wrote:
> > OCR is hard
> OCR is hard mostly because of the analog components (and the 
> variety of fonts that exist).  If you are generating the 
> image digitally (and with a limited set of fonts), most of 
> the OCR problems go away.

Youre assuming a simplistic implementation. The use of a single font,
and the absence of noise or distortion. At any rate, its certainly much
harder than writing a perl regex, both in terms of brainpower and in
terms of computing power required.

> > Some examples of reverse turing tests. (http://www.captcha.net/)
> It appears that each of those introduces non ADA compliant 
> aspects. The first and third can be defeated with a database 
> no larger than that needed to implement it, the third is 
> unlikely to work on many platforms (audio dependancies kept 
> it from working for me), and the fourth I couldn't even 
> figure out as a human--not what we're looking for.

Youre assuming a simplistic implementation; a database of words and
images. A sophisticated implementation would generate images from random
words with random distortions added, sounds by overlaying random words
with random backgrounds.

You've also ignoring the third test, which is list membership.
If youre not capable of passing the reverse turing tests offered, you
can always join the list for unobscured access.
> > Between an audio test and a visual test, you've got the 
> blind and the 
> > deaf covered.
> And you've introduced lots of browser/platform dependancies 
> that mean you can't use new low-bandwidth platforms, like WAP.

You're ignoring the third test offered, which is list membership. 'enter
your email address and password here'.

Between the three kinds of tests, a person who desires at least the same
functionality as is offered today, can do so, no matter what platform
they are on.

Let me reiterate that what is being proposed here is the obscuration of
email addresses in the public archives; that is, the archives available
to the world for casual inspection.

Perhaps it might be fruitfull to look at omitting the email addresses in
the public archives entirely. That would certainly be ADA compliant, and
would be useable by anyone with any html 1.0 capable browser.

As I see it, the questions are: 

Is it desireable to prevent the whole world seeing email addresses in
mailman archives? 
If yes then
	should there be public and private archives, with the public
archive obscuring addresses?
	if yes
		how should the access to the private archives be
			list membership?
			reverse truing tests?
		what should go into the public archives?
			obscured email?
				email as images?
				text based obfuscation?
			links to web form email?
			omit email addresses entirely?
	else if no
		should an obfuscation scheme be used at all?
		if yes
			what obfuscation scheme(s) should be used?
				obscured email?
					email as images?
					text based obfuscation?
				links to web form email?
				omit email addresses entirely?
		else if no
			talking in circles
else if no
	end of conversation