[Mailman-Users] [Mailman-cabal] GDPR

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Tue May 22 21:33:18 EDT 2018

Grant Taylor via Mailman-Users writes:
 > On 05/14/2018 06:33 AM, Andrew Hodgson wrote:

 > > Current advice from the GDPR people is we may have to delete the whole 
 > > thread.
 > What is their working definition of "thread"?

I would imagine that it is the subthread rooted at the first post
containing complainant's PII -- "Personally Identifying Information".

 > Why can't just the individual's message(s) be delete?  Or better 
 > redacted to not reflect them?

That is going to depend on the presence of PII in the messages.  If
*whole messages* are to be deleted, that would presumably involve
content that somehow identifies the person.  I would expect that we
don't have to delete whole bug reports on this list just because
somebody requests their PII be redacted.

What worries me more is the implications for blockchain, or more
precisely, DAG-based VCSes that use hashes for integrity check like
git: the identity of commits will change if authors and emails are
redacted, including if a commit log refers to PII of a bug reporter as
they often do.  I guess you'd need to maintain an index of pointers
from old commit ids, or at least for branches and tags (we do have the
reflog in git).

And heaven help you if you're a security conscious group like the
Linux kernel and use signed commits.  I guess the person who does the
redaction would sign the new commits, but that's pretty yucky -- that
person could do anything and nobody would know when it happened
because you have to delete the old commits and blobs that get redacted.

 > > Still under discussion, this is also complex because threads and
 > > subjects change, if we delete the whole thread there may be
 > > messages from the same author in other threads that don't have
 > > correct atribution etc.

As I understand the "right to be forgotten", it's *not* a right to
arbitrarily edit content stored by someone else, it's the right to
redact *all* PII in that content.  It's not just messages from a
person, it's headers containing their name and email address,
attribution lines for quoted material, quoted .sigs, etc etc.

 > I see six modes of access to the data:
 > 1)  List subscribers
 > 2)  List owners / administrators
 > 3)  Host system administrators
 > 4)  Administrators that are in the downstream SMTP / HTTP path and can 
 >     track things.
 > 5)  Backups.
 > 6)  Ongoing Discovery.

You're missing

0)  Randos accessing public archives.

For (0), the only logging would be IP addresses in the webserver.

 > I would expect that #1 requires authentication to MM for
 > subscribers to see data, and I expect that this is logged in some
 > (indirect) capacity.

No.  The accessing IPs will be in the webserver logs, but I don't
think there is any logging in either Mailman 2 or Mailman 3 of
authentication data.  All there would be is the implication that
authentication was successful if that data were accessed.  In Mailman
2 there's no PII data whatsoever except for email address and (maybe)
display name in the subscriber data.  I suppose you could put phone #s
and junk like that in the display name, but GDPR is more concerned
with the database fields that might store PII than the actual content.

 > I would expect that #2 would have access to the data as part of their 
 > role of owning / administering a mailing list.

However, in Mailman 2 the various list passwords are shared, and would
not identify individuals in cases with multiple moderators or list

 > I would also expect that #3 has the capability to access the data.  But 
 > I would also expect that #3 would not access the data in normal day to 
 > day operations.

Indeed.  The problem is identifying them if they do, since they can
just use normal filesystem operations from the shell, which are not
normally logged at all.  In Mailman 3, we can configure databases like
PostgreSQL, which I suppose can log access to the subscriber
databases, and which make it hard (but not impossible) to access data
via ordinary filesystem operations.

However, I think that the issue here is basically moot.  You keep host
access logs to check for suspicious IP addresses (attempting to) log
in, and otherwise (for #2 and #3) you just give the list of all the
people who can access that data in the normal course of their duties.
I don't think the issue with logging is pinning down a particular
access to specific data, but rather determining who *could* access
that data.  The relevant access might have been by a long-since fired
engineer who did a Snowden on your database.  How could you possibly

 > Are you saying that GDPR is going to complicate things related to
 > #3 and make it such that there is more of a union between #2 and
 > #3?  I.e. exclude 3rd party site hosters from being able to be #3?

I don't understand the "exclude third party site hosters".  The
GDPR requirement is not to *limit* access, it's to *log* access.

 > What is their working definition of "marketing"?

I'm pretty sure they're referring to CRM-type databases where you
track customer interactions over time, linked by PII, and build up a
profile.  One-off "for sale" posts wouldn't matter.  However, if this
were a common activity on the list, the *archives* might qualify as
such a database.

 > IMHO: History happened.  (Some) People will remember (some) details
 > (for a while).  Removing evidence of them does not mean that
 > history did not happen.

Sure, the point is to make it difficult for 3rd parties to discover
that history ex post.  I don't think the legislators envisioned people
invoking these rights frivolously or maliciously (though I do :-/).

 > Are #5 and #6 accounted for?

Backups would need to be redacted as well, I suppose.  I have no idea
what you mean by "ongoing discovery".

 > What about #4 downstream?

Not Mailman host's problem, assuming all subscribers have properly
been opted in and are allowed to opt out at will, as is normally the
case.  Distributing content downstream is the purpose of the software,
and subscribers are aware of that.  The only edge cases I can imagine
offhand is the one discussed elsewhere in the thread, where a
subscriber posts a third party's information without permission, and
possibly an open-post list where the poster doesn't realize that it's
open subscription/public archives/whatever.

 > Or something  like the NSA's PRISM program.

Not Mailman host's problem.

 > I fell like there should be a GDPR counterpart of reasonable level of 
 > effort in good faith.

Sure, but you probably won't like what the courts consider reasonable.

 > I'm not quite sure what to do in a situation of a litigation hold
 > that suspends expunging of backups.

You lock up the backups offline unless and until the court asks for
them or you actually need to restore.  That reasonably addresses the
privacy issue itself, and you're covered by the "essential to business
purpose" clause for the duration of the court order.

 > I'm simply bringing up things that I think are potential concerns
 > that the powers that be probably need to consider, and have a pat
 > response to.

More information about the Mailman-Users mailing list