Guys,
Thanks for all the discussion around this topic. I have been in further communication with the people working on GDPR with us. Background: I run Mailman lists for a couple of charities as a voluntary contribution to the charities, the charities have money that their disposal and we want to reduce exposure both for me personally and the charities involved.
These are just rough notes:
Archive purge requests. We have discussed the same items as on the list to date. I am looking at doing a simple grep for the relevant person's details and changing that. The main reason for doing this is that if we just remove the author's messages they will be in a thread of other messages and our users typically don't remove quoted material. Current advice from the GDPR people is we may have to delete the whole thread. Still under discussion, this is also complex because threads and subjects change, if we delete the whole thread there may be messages from the same author in other threads that don't have correct atribution etc.
Audit logs for data access. it is not clear who is accessing subscription data for the list as there is just a single owner and moderator account. Unsure if current logging data in either MM2 or MM3 is "good enough" for this. MM3 may solve the issue about single accounts.
Relevant people seem to be happy that running a discussion list not used for marketing purposes should exempt us from some of the marketing type rules regarding data processing.
People seem happy with the system default logs as long as we can audit access to the logs (which we are able to as there is little access to the boxes themselves).
Likely that I will have to move the lists to a host the charities control themselves and a separate host for each charity. This will increase costs so we may need to look at an alternative solution like a hosted list service as I am not setting myself up as a list hosting business.
Again all this up for interpretation. The largest ones for me at the moment is regarding auditing access to the Mailman admin access and the archive purging requests.
Andrew.
On 05/14/2018 06:33 AM, Andrew Hodgson wrote:
- Archive purge requests. We have discussed the same items as on the list to date. I am looking at doing a simple grep for the relevant person's details and changing that. The main reason for doing this is that if we just remove the author's messages they will be in a thread of other messages and our users typically don't remove quoted material.
ACK
This seems like the lowest common denominator.
Current advice from the GDPR people is we may have to delete the whole thread.
What‽
What is their working definition of "thread"?
Consider this scenario: a LONG running thread and the person exercising their right to be forgotten simply adds a "me to" or an insult at the very end.
Does that thread, which obviously had a lot of value to the thread participants need to be deleted?
Why can't just the individual's message(s) be delete? Or better redacted to not reflect them?
Still under discussion, this is also complex because threads and subjects change, if we delete the whole thread there may be messages from the same author in other threads that don't have correct atribution etc.
What does GDPR have to say, if anything, about subscribers having their own archives, which will not be redacted in any way? — Is the mailing list owner / administrator in any way, shape, or form, responsible for expunging those records too?
- Audit logs for data access. it is not clear who is accessing subscription data for the list as there is just a single owner and moderator account. Unsure if current logging data in either MM2 or MM3 is "good enough" for this. MM3 may solve the issue about single accounts.
I guess I don't understand the problem and / or make invalid assumptions about MM.
I see six modes of access to the data:
- List subscribers
- List owners / administrators
- Host system administrators
- Administrators that are in the downstream SMTP / HTTP path and can track things.
- Backups.
- Ongoing Discovery.
I would expect that #1 requires authentication to MM for subscribers to see data, and I expect that this is logged in some (indirect) capacity.
I would expect that #2 would have access to the data as part of their role of owning / administering a mailing list.
I would also expect that #3 has the capability to access the data. But I would also expect that #3 would not access the data in normal day to day operations.
Are you saying that GDPR is going to complicate things related to #3 and make it such that there is more of a union between #2 and #3? I.e. exclude 3rd party site hosters from being able to be #3?
What say you / them about #4?
- Relevant people seem to be happy that running a discussion list not used for marketing purposes should exempt us from some of the marketing type rules regarding data processing.
What is their working definition of "marketing"?
Does someone saying "Hay, I've got a hand knitted blanket for sale, contact me directly if you're interested." count as marketing? What about a news list from a library saying "Bob is managing the sale of used computer equipment."? They both refer to items for sale and how to contact someone off list.
To be really ornery, what if Bob is the person exercising his right to be forgotten. — Can you simply redact his name & contact info? Can you replace it with someone else's? — Or do you need to delete the entire thread and send out a new message / thread?
IMHO: History happened. (Some) People will remember (some) details (for a while). Removing evidence of them does not mean that history did not happen.
- People seem happy with the system default logs as long as we can audit access to the logs (which we are able to as there is little access to the boxes themselves).
Please forgive me for questioning if all of your bases are covered.
Are #5 and #6 accounted for? What about #4 downstream? Or something like the NSA's PRISM program.
- Likely that I will have to move the lists to a host the charities control themselves and a separate host for each charity. This will increase costs so we may need to look at an alternative solution like a hosted list service as I am not setting myself up as a list hosting business.
I understand why you say this. But to me this is an unacceptable solution. It certainly will not scale.
I fell like there should be a GDPR counterpart of reasonable level of effort in good faith. — I.e. redacting things in existing files and stating that backups are expunged after X number of days. — I'm perfectly fine responding to someone saying "I've REDACTED you from live files, and old backups will automatically expunge…" in a short time frame after the ""amnesia request. Yet knowing that I can't mark something as completely resolved until after the backups do expunge.
I'm not quite sure what to do in a situation of a litigation hold that suspends expunging of backups.
¯\_(ツ)_/¯
Again all this up for interpretation. The largest ones for me at the moment is regarding auditing access to the Mailman admin access and the archive purging requests.
I'm not trying to come across as argumentative. I'm sorry if I am. I'm simply bringing up things that I think are potential concerns that the powers that be probably need to consider, and have a pat response to.
-- Grant. . . . unix || die
Grant Taylor via Mailman-Users wrote: ... lots of good examples ... well done !
I too dont think any complainer should have the right to kill a thread, just cos he/she wrote something they later wish to retract. Killing a thread would be gross abuse of all other posters' rights, & would invite worse abuse: anyone could write to a thread knowing they could leverage it later to kill a whole thread.
My guess is GDPR (& later similar elsewhere) will probably have been drafted by, & interpreted by mostly politicians & lawyers clueless of our sort of mail lists, who will not have thought through most nasty edge cases we could easily present. Most probably they wont know more than nasty anonymous low grade abusive cases on commercial [anti-]social web chat forums.
( As a crude test I'd expect most drafters to be top posters, gratuitously breaking context, not our sort of list people. (I only know one lawyer professionaly, & typicaly he top posts, & thinks tech style bottom posters weird & they should confirm to his Normal standards, - never occurs to such `Normal' people that they are un-educated, & are contravening Internet procedures techs evolved for good reasons. )).
So no faith in GDPR or similar being anything other than drafted by & interpreted by ignorant `Normal' people who will bring us nothing but trouble, & who will seek to waste time of unpaid admins.
Hence my intent is to reduce the threat of time wasters as much as pos.: to draft something that says all those who don't conform to our norms are breaching the domains terms of unpaid service, & they lose all rights to waste our time. It wont be water- tight, but if it reduces time wasters, it's sufficient.
Most unpaid volunteer admins aren't about to pay their own money to get lawyers to write water tight clauses to protect us from wasters, so I see no better option.
Cheers, Julian
Julian Stacey, Computer Consultant, Systems Engineer, BSD Linux Unix, Munich Brexit Referendum stole 3,700,000 votes, inc. 700,000 from British in EU. UK Govt. lied it's "democratic" in Article 50 letter to EU paragraph 3. Petition for votes: http://berklix.eu/queen/
Grant Taylor asked:
What does GDPR have to say, if anything, about subscribers having their own archives, which will not be redacted in any way?
IMHO they would mostly fail under §18 and GDPR wouldn't apply:
This Regulation does not apply to the processing of personal data by a natural person in the course of a purely personal or household activity and thus with no connection to a professional or commercial activity. Personal or household activities could include correspondence and the holding of addresses, or social networking and online activity undertaken within the context of such activities. However, this Regulation applies to controllers or processors which provide the means for processing personal data for such personal or household activities.
Of course, if a company was using the mailing list to process personal data, it should have been stated the whole time.
Being nitpicky. What about sysadmins subscribed to this list as part of their professional activity ? (but otherwise interacting in the same way as a hobbyist)
On 05/14/2018 05:02 PM, Ángel wrote:
Being nitpicky. What about sysadmins subscribed to this list as part of their professional activity ? (but otherwise interacting in the same way as a hobbyist)
How do hobbyists interact? Enquiring minds want to know.
-- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
On 05/14/2018 04:02 PM, Ángel wrote:
IMHO they would mostly fail under §18 and GDPR wouldn't apply:
Okay.
What happens if a subsequent data breach (malware / infection) causes said individual archives to become public information? }:-)
Of course, if a company was using the mailing list to process personal data, it should have been stated the whole time.
I half way suspect this happens much more commonly than you might think.
I've seen info@ or sales@ or the likes positional addresses be front ends for mailing lists (of one form or another) that redistributes the email to multiple (usually) internal (usually) employees. I have never seen these types of expansion contacts disclosed as such.
Being nitpicky. What about sysadmins subscribed to this list as part of their professional activity ?
I know that this happens. But I would argue that the SA should not subscribe themselves. Instead there should be an additional monitoring email address specifically for that purpose.
I'd really like to see an intelligent Mailing List Manager have the ability to subscribe an address like this that is used as a feedback loop. I.e. Did the MLM receive a copy of the message that it sent yesterday. I'd assume that it would be something like <$list>-fbl@<$list_domain> to avoid recursive loops.
That would allow the MLM to self monitor and escalate if there's a problem.
-- Grant. . . . unix || die
Grant Taylor wrote:
On 05/14/2018 06:33 AM, Andrew Hodgson wrote:
[...]
- Audit logs for data access. it is not clear who is accessing subscription data for the list as there is just a single owner and moderator account. Unsure if current logging data in either MM2 or MM3 is "good enough" for this. MM3 may solve the issue about single accounts.
I guess I don't understand the problem and / or make invalid assumptions about MM.
I see six modes of access to the data:
- List subscribers
- List owners / administrators
At the moment the list administrator and moderator account is accessed via no username and a single password. If that password is shared, I have no audit trail of who logged into the system. Also the system currently doesn't log specific access, for example admin A exported a load of addresses, admin B added 100 subscribers to the mailing list etc.
Andrew.
On 05/15/2018 03:18 AM, Andrew Hodgson wrote:
At the moment the list administrator and moderator account is accessed via no username and a single password. If that password is shared, I have no audit trail of who logged into the system.
ACK
I like to run Mailman (et al) administration pages behind htaccess protection. Thus I have the username that authenticated to the web server to corroborate who's actually accessing things.
Also the system currently doesn't log specific access, for example admin A exported a load of addresses, admin B added 100 subscribers to the mailing list etc.
Can you not tell what was done based on the web server logs and the requested URLs? I know that won't catch POST data, but it will give you more information than not looking at the web server logs.
Aside: I personally consider the web server to be part of the application framework. As such, I exercise and use it to (what I think is) my advantage.
-- Grant. . . . unix || die
Grant Taylor via Mailman-Users writes:
On 05/14/2018 06:33 AM, Andrew Hodgson wrote:
Current advice from the GDPR people is we may have to delete the whole thread.
What is their working definition of "thread"?
I would imagine that it is the subthread rooted at the first post containing complainant's PII -- "Personally Identifying Information".
Why can't just the individual's message(s) be delete? Or better redacted to not reflect them?
That is going to depend on the presence of PII in the messages. If *whole messages* are to be deleted, that would presumably involve content that somehow identifies the person. I would expect that we don't have to delete whole bug reports on this list just because somebody requests their PII be redacted.
What worries me more is the implications for blockchain, or more precisely, DAG-based VCSes that use hashes for integrity check like git: the identity of commits will change if authors and emails are redacted, including if a commit log refers to PII of a bug reporter as they often do. I guess you'd need to maintain an index of pointers from old commit ids, or at least for branches and tags (we do have the reflog in git).
And heaven help you if you're a security conscious group like the Linux kernel and use signed commits. I guess the person who does the redaction would sign the new commits, but that's pretty yucky -- that person could do anything and nobody would know when it happened because you have to delete the old commits and blobs that get redacted.
Still under discussion, this is also complex because threads and subjects change, if we delete the whole thread there may be messages from the same author in other threads that don't have correct atribution etc.
As I understand the "right to be forgotten", it's *not* a right to arbitrarily edit content stored by someone else, it's the right to redact *all* PII in that content. It's not just messages from a person, it's headers containing their name and email address, attribution lines for quoted material, quoted .sigs, etc etc.
I see six modes of access to the data:
- List subscribers
- List owners / administrators
- Host system administrators
- Administrators that are in the downstream SMTP / HTTP path and can track things.
- Backups.
- Ongoing Discovery.
You're missing
- Randos accessing public archives.
For (0), the only logging would be IP addresses in the webserver.
I would expect that #1 requires authentication to MM for subscribers to see data, and I expect that this is logged in some (indirect) capacity.
No. The accessing IPs will be in the webserver logs, but I don't think there is any logging in either Mailman 2 or Mailman 3 of authentication data. All there would be is the implication that authentication was successful if that data were accessed. In Mailman 2 there's no PII data whatsoever except for email address and (maybe) display name in the subscriber data. I suppose you could put phone #s and junk like that in the display name, but GDPR is more concerned with the database fields that might store PII than the actual content.
I would expect that #2 would have access to the data as part of their role of owning / administering a mailing list.
However, in Mailman 2 the various list passwords are shared, and would not identify individuals in cases with multiple moderators or list owners.
I would also expect that #3 has the capability to access the data. But I would also expect that #3 would not access the data in normal day to day operations.
Indeed. The problem is identifying them if they do, since they can just use normal filesystem operations from the shell, which are not normally logged at all. In Mailman 3, we can configure databases like PostgreSQL, which I suppose can log access to the subscriber databases, and which make it hard (but not impossible) to access data via ordinary filesystem operations.
However, I think that the issue here is basically moot. You keep host access logs to check for suspicious IP addresses (attempting to) log in, and otherwise (for #2 and #3) you just give the list of all the people who can access that data in the normal course of their duties. I don't think the issue with logging is pinning down a particular access to specific data, but rather determining who *could* access that data. The relevant access might have been by a long-since fired engineer who did a Snowden on your database. How could you possibly know?
Are you saying that GDPR is going to complicate things related to #3 and make it such that there is more of a union between #2 and #3? I.e. exclude 3rd party site hosters from being able to be #3?
I don't understand the "exclude third party site hosters". The GDPR requirement is not to *limit* access, it's to *log* access.
What is their working definition of "marketing"?
I'm pretty sure they're referring to CRM-type databases where you track customer interactions over time, linked by PII, and build up a profile. One-off "for sale" posts wouldn't matter. However, if this were a common activity on the list, the *archives* might qualify as such a database.
IMHO: History happened. (Some) People will remember (some) details (for a while). Removing evidence of them does not mean that history did not happen.
Sure, the point is to make it difficult for 3rd parties to discover that history ex post. I don't think the legislators envisioned people invoking these rights frivolously or maliciously (though I do :-/).
Are #5 and #6 accounted for?
Backups would need to be redacted as well, I suppose. I have no idea what you mean by "ongoing discovery".
What about #4 downstream?
Not Mailman host's problem, assuming all subscribers have properly been opted in and are allowed to opt out at will, as is normally the case. Distributing content downstream is the purpose of the software, and subscribers are aware of that. The only edge cases I can imagine offhand is the one discussed elsewhere in the thread, where a subscriber posts a third party's information without permission, and possibly an open-post list where the poster doesn't realize that it's open subscription/public archives/whatever.
Or something like the NSA's PRISM program.
Not Mailman host's problem.
I fell like there should be a GDPR counterpart of reasonable level of effort in good faith.
Sure, but you probably won't like what the courts consider reasonable.
I'm not quite sure what to do in a situation of a litigation hold that suspends expunging of backups.
You lock up the backups offline unless and until the court asks for them or you actually need to restore. That reasonably addresses the privacy issue itself, and you're covered by the "essential to business purpose" clause for the duration of the court order.
I'm simply bringing up things that I think are potential concerns that the powers that be probably need to consider, and have a pat response to.
On 05/22/2018 07:33 PM, Stephen J. Turnbull wrote:
I would imagine that it is the subthread rooted at the first post containing complainant's PII -- "Personally Identifying Information".
I feel like that's a self referencing definition.
A "thread" is "a subthread rooted at the first post containing PII".
I agree that's where the focus should start. But I don't think it defines a thread in the way that I'm asking.
What is their working definition of "thread"?
Let's say:
- Bla
- +--- Re: Bla
- +--- Re: Bla
- | +--- BlaBlaBla
- +--- Re: Bla
+--- I hijacked this thread because I need help!!!
Let's say the PII was in message 3 and the person replying to it in message 4 removed the PII. Do messages 3 and 4 need to be removed (or otherwise modified)?
Let's say that message 1 had the PII, messages 2, 3, and 5 quoted it, but 4 did not and 6 is a hijacker that hit reply on the most convenient message (under his cursor) and removed all content. Do messages 4 and 6 need to be removed?
What is the "(sub)thread" that needs to be removed?
That is going to depend on the presence of PII in the messages. If *whole messages* are to be deleted, that would presumably involve content that somehow identifies the person. I would expect that we don't have to delete whole bug reports on this list just because somebody requests their PII be redacted.
I agree that it's possible to remove / redact PII without deleting the items containing the PII.
Think about it this way, spooks don't shred the entire sheet of paper, instead they take a black marker and redact just the pieces that need to be removed.
I'm afraid that the infinite wisdom of politicians will say that the entire paper needs to be shredded.
I think it also significantly depends on what needs to be redacted. Removing "supercalifragilisticexpialidocious" is a LOT different than removing "Grant Taylor" from the Mailman-Users archive. "supercalifragilisticexpialidocious" would be like reference to an event. "Grant Taylor" would be any mention of my (or an impostor's) name.
The former is likely MUCH simpler to do than the latter. The latter will also impact MANY more messages.
What worries me more is the implications for blockchain, or more precisely, DAG-based VCSes that use hashes for integrity check like git: the identity of commits will change if authors and emails are redacted, including if a commit log refers to PII of a bug reporter as they often do. I guess you'd need to maintain an index of pointers from old commit ids, or at least for branches and tags (we do have the reflog in git).
I don't want to try to work that out.
And heaven help you if you're a security conscious group like the Linux kernel and use signed commits. I guess the person who does the redaction would sign the new commits, but that's pretty yucky -- that person could do anything and nobody would know when it happened because you have to delete the old commits and blobs that get redacted.
Yep.
As I understand the "right to be forgotten", it's *not* a right to arbitrarily edit content stored by someone else, it's the right to redact *all* PII in that content.
Agreed.
In this case, I don't think that supercalifragilisticexpialidocious qualifies under GDPR's right to be forgotten. }:-)
It's not just messages from a person, it's headers containing their name and email address, attribution lines for quoted material, quoted .sigs, etc etc.
Agreed.
What about headers containing message ID from an uncommon / single user domain like mine? I'd say that anything that can be used to identify less than a group of 1000 people would probably need to be redacted. (I just chose 1000 arbitrarily, but it's a starting point.)
You're missing
- Randos accessing public archives.
What other modes have we collectively missed?
For (0), the only logging would be IP addresses in the webserver.
True.
No. The accessing IPs will be in the webserver logs, but I don't think there is any logging in either Mailman 2 or Mailman 3 of authentication data. All there would be is the implication that authentication was successful if that data were accessed.
Okay.
I wonder if there's any correlation between the IP that authenticated and the IP that accessed data.
In Mailman 2 there's no PII data whatsoever except for email address and (maybe) display name in the subscriber data.
I expect that either of those, the email address -or- the display name are enough to count as PII.
I believe it's fair to say that people expect gtaylor (at) tnetconsulting (dot) net to reference a single person. I also believe it's fair to say that most people expect most email addresses to identify be associated with one person. The only exceptions to the rule being things like positional addresses; sales@ or info@ or webmaster@.
I suppose you could put phone #s and junk like that in the display name, but GDPR is more concerned with the database fields that might store PII than the actual content.
- I'd consider the phone numbers in the display name to be a form of display name.
- *sigh* It sounds like GDPR is talking about specific fields that could contain PII, even if they don't, while ignoring other fields that erroneously do contain PII.
However, in Mailman 2 the various list passwords are shared, and would not identify individuals in cases with multiple moderators or list owners.
IMHO that's an operational mis-step. I get that it does happen. But I think that it shouldn't. People tend to share root password on unix too, despite multiple other options where it's not needed.
Indeed. The problem is identifying them if they do, since they can just use normal filesystem operations from the shell, which are not normally logged at all.
Where I've worked, it was assumed that if you had an ID on the box and file system level permission to access things then you effectively had accessed it. — If you can't prove that they didn't access the data, then you assume that they did access the data.
In Mailman 3, we can configure databases like PostgreSQL, which I suppose can log access to the subscriber databases, and which make it hard (but not impossible) to access data via ordinary filesystem operations.
Having an RDBMS (et al) manage the files doesn't prevent file level access. I can very likely still copy the DB file(s) and do my own thing with them to extract the data.
This is where (and why) DB encryption comes into play. Though, if a rogue admin has access to the decryption key through any method. (This includes extracting it out of memory.) }:-)
However, I think that the issue here is basically moot. You keep host access logs to check for suspicious IP addresses (attempting to) log in, and otherwise (for #2 and #3) you just give the list of all the people who can access that data in the normal course of their duties.
Yep.
I don't think the issue with logging is pinning down a particular access to specific data, but rather determining who *could* access that data.
Yep. Yep.
The relevant access might have been by a long-since fired engineer who did a Snowden on your database. How could you possibly know?
Yep. Yep. Yep.
I don't understand the "exclude third party site hosters". The GDPR requirement is not to *limit* access, it's to *log* access.
I was trying to imply that companies would need to host their own list servers. Meaning that they couldn't outsource it to 3rd party companies, whom have their own host system administrators.
I'm pretty sure they're referring to CRM-type databases where you track customer interactions over time, linked by PII, and build up a profile. One-off "for sale" posts wouldn't matter. However, if this were a common activity on the list, the *archives* might qualify as such a database.
~chuckle~
How many grains of sand does it take to make a pile?
IMHO none. You just have to declare the pile's location.
Sure, the point is to make it difficult for 3rd parties to discover that history ex post.
Okay. I want to make sure I'm understanding you correctly. (Part of) GDPR is not about (just) knowing who has (had at the time) legitimate access to data, but additionally making it more difficult for other 3rd parties to gain access to the data in the future. By the fact that the data is removed from the corpus that the 3rd party is subsequently given access to.
I don't think the legislators envisioned people invoking these rights frivolously or maliciously (though I do :-/).
Agreed.
Backups would need to be redacted as well, I suppose.
Um... that also presents a severe technical problem. One that could impose large operational expenses. Suppose a company contracts to store their backup tapes off sight. This means that they would need to recall the tapes that need to be redacted, do so, send the tapes back to the offsite storage. This may involve an additional company that is simply the courier. Let's not forget about the off site companies handling fees and the courier's fees. Both ways for each tape. Let's also throw company policies in place that dictate that only X number of drives can be in transit or recalled at one time. That's a logistical nightmare, could take more than a trivial amount of time to complete, and untold cost. Ouch!
I have no idea what you mean by "ongoing discovery".
Ah.
Let's say that Wile E. Coyote decides to sue Acme because of their bad products. As soon as the lawsuit is initiated, chances are very good that Acme's lawyers will 1) tell them to destroy all records or 2) tell Acme's IT staff that they can no longer rotate out any backups that may contain data pertinent to the lawsuit. This is to facilitate the legal process of discovering evidence to be used in the case. (Either way, for or against, Mr. Coyote, doesn't matter.)
I frequently hear about this referred to as one of two things "Litigation Hold" or "(Electronic) Discovery". Discovery being the more common term and applies to more than just electronics.
Not Mailman host's problem, assuming all subscribers have properly been opted in and are allowed to opt out at will, as is normally the case.
What about that pesky time where the moderator hasn't approved the unsubscribe request. (I think I remember seeing that option in Mailman.)
Distributing content downstream is the purpose of the software, and subscribers are aware of that. The only edge cases I can imagine offhand is the one discussed elsewhere in the thread, where a subscriber posts a third party's information without permission, and possibly an open-post list where the poster doesn't realize that it's open subscription/public archives/whatever.
I think you misinterpreted what I was referring to. Or I'm misinterpreting your reply.
I'm talking about 3rd party spam filtering services that are in the path between, downstream in between Mailman and the recipient's server. They collect logs / data all the time. Usually those logs and that data are what help them be better at their job of spam filtering.
Not Mailman host's problem.
Okay.
Sure, but you probably won't like what the courts consider reasonable.
"reasonable" is always subject to deliberation.
Lawyers get payed to tell a judge that "It will cost $Company $50,000 dollars to recover the messages that $Plaintiff is requesting from $Defendant as part of their sunshine law request. Here's why:
- We don't have a server that we can use so we must buy a low end machine. (Legit, when there is only one mail server and the business can't be without mail for days / weeks.)
- We need another tape drive to do the restores.
- It will take $X number of (wo)man hours at $Y dollars per hour.
- We, $Defendant's lawyers must go through the emails at $YYYYY dollars per hour to make sure there's nothing given out that's outside of the sunshine law request.
- You just expanded the scope of your discovery? Well, now we need to increase #1 and #2 to go through the last 5 years of things in the next three weeks. Also #3 and #4. }:-)
So … the total bill for your sunshine request comes to just over $50,000. Are you willing to pay that bill to get an answer to your question via a sunshine law request?
Aside: A sunshine law request is a request from a citizen to a governmental body for data that was arguably payed for by tax funding and on behalf of citizens, thus the citizen effectively owns the data in a round about way. — I don't know how wide spread that is.
You lock up the backups offline unless and until the court asks for them or you actually need to restore. That reasonably addresses the privacy issue itself, and you're covered by the "essential to business purpose" clause for the duration of the court order.
- We have to buy additional tapes to replace the tapes that are on Lit' Hold.
- We have to pay for more storage to accommodate #6. (Or we have to pay someone to house the tapes in a secure manner.)
I digress.
-- Grant. . . . unix || die
Grant Taylor via Mailman-Users writes:
What is their working definition of "thread"?
I don't know. I gave what I think is a reasonable definition, and I would argue that going to parents of that message is not required by GDPR, even if for some reason you need to remove whole posts.
I'm afraid that the infinite wisdom of politicians will say that the entire paper needs to be shredded.
We know what the politicians said. It's in the GDPR law. Forget politicians' stupidity. What matters now is (1) what courts will say, and (2) what courts will refuse to call frivolous (so that the party with the uglier lawyer wins at great expense to the party with the beautiful lawyer).
Appeals judges generally are pretty sensible in the U.S. and Japan, and usually they do understand the issues. I suppose it's similar in the EU.
What I'm concerned with is where PII can enter Mailman and be stored on the host. Whether the law reaches that or not is not really important here. We look at each place, decide how easy it is to (1) find all instances of a particular identifier, (2) determine whether and by whom it has been accessed, and (3) redact that identifier. Then we look at costs and start implementing the cheaper cases.
I think it also significantly depends on what needs to be redacted. Removing "supercalifragilisticexpialidocious" is a LOT different than removing "Grant Taylor" from the Mailman-Users archive.
It needs to be personally identifying, and pragmatically (1) above means either (a) it will be found in certain header fields which we can remove entirely or redact in full or part, or (b) a full-text search will find it. This means that descriptions like "the US politician known to lie 6 times a day" are out -- there are too many ways to express that. If GDPR requires finding and redacting that, the list will have to fold up shop. But I don't think it does: I think here PII refers to numbers, names, and addresses (as we usually understand those words!) that uniquely identify a person for purposes such delivering goods, services and information, or as part of an authentication process for accessing services (eg, financial or informational).
I wonder if there's any correlation between the IP that authenticated and the IP that accessed data.
Not in Mailman, although it could be done. HTTP is a stateless protocol, so to maintain a session you need to provide a token (typically a "cookie"). That token can be passed around in the user's network. It would be possible to include the IP in the data hashed to create the auth token, and validate that, but we don't.
- *sigh* It sounds like GDPR is talking about specific fields that could contain PII, even if they don't, while ignoring other fields that erroneously do contain PII.
It's not GDPR. *I* wrote that. What I was trying to say is that there are fields like display name and email that are normally used for data that is PII, and so would be presumed to contain PII if populated in a database record.
However, in Mailman 2 the various list passwords are shared, and would not identify individuals in cases with multiple moderators or list owners.
IMHO that's an operational mis-step.
It's a FACT, and it's not going to change in Mailman 2. We need to work with it, or perhaps European lists simply won't be able to use Mailman 2 with multiple admins if GDPR requires auth that identifies a single individual. (Mailman 3 does allow identifying a single individual, but I don't think we log auth attempts or successes yet.)
(Part of) GDPR is not about (just) knowing who has (had at the time) legitimate access to data, but additionally making it more difficult for other 3rd parties to gain access to the data in the future. By the fact that the data is removed from the corpus that the 3rd party is subsequently given access to.
I don't think "make it difficult to access data" is a requirement in GDPR. I think making reconstruction of history difficult is the *intent* of GDPR's "right to be forgotten", but that doesn't mean you need to conceal data (such as social network "handles") that is normally used to identify users in operation.
The access logging is about a different aspect of privacy, which is knowing who had access to that data.
AFAICS, the privacy policy itself is up to the host and/or the industry and its regulators. Wikis may have zero privacy in normal operation, but you still need to log accesses to people's profiles I suppose. Banking privacy is specified by banking laws, not GDPR, I suppose, but again GDPR mandates logging of accesses.
I'm talking about 3rd party spam filtering services that are in the path between, downstream in between Mailman and the recipient's server. They collect logs / data all the time. Usually those logs and that data are what help them be better at their job of spam filtering.
The Mailman admins don't have access to that data in this scenario, I assume. I don't really think the Mailman host is implicated there, even if they're the direct client of such a service. I suspect what the Mailman host needs to worry about most is interruption of service if the vendor gets put out of business for GDPR violation.
Steve
Hi all!
On Mon, 2018-05-14 at 12:33 +0000, Andrew Hodgson wrote: [...]
These are just rough notes:
- Archive purge requests. We have discussed the same items as on the list to date. I am looking at doing a simple grep for the relevant person's details and changing that. The main reason for doing this is that if we just remove the author's messages they will be in a thread of other messages and our users typically don't remove quoted material. Current advice from the GDPR people is we may have to delete the whole thread. Still under discussion, this is also
While at it, why not delete the entire archive just to be sure? SCNR ....
Seriously, these folks don't know what they imply.
And to be honest: If person X fullquotes and the email ends in an archive, who's fault is it?
Obviously the archive's (or more it's owners), not?
For the author's rights side to it: I answer an email (and happen to quote just the relevant parts of other emails) to a public mailinglist with a public archive. I don't think that the archive's admin or anyone else should have the right (let alone the duty) to *edit* or *change* *my* email in there - or even worse: *remove* it completely.
MfG, Bernd
PS: The whole "right to be forgotten" idea is absurd per se - think about private archives (and I don't think about 3-letter organizations only). Can't we define the public archive to be an *necessary* and *important* part of a public mailinglist and be done with it?! For almost everyone else, some "important reason" is good enough too.
Bernd Petrovitsch Email : bernd@petrovitsch.priv.at LUGA : http://www.luga.at
On 05/14/2018 04:11 PM, Bernd Petrovitsch wrote:
Seriously, these folks don't know what they imply.
Nope. Politicians (almost) never fully understand what's going on.
And to be honest: If person X fullquotes and the email ends in an archive, who's fault is it?
Obviously the archive's (or more it's owners), not?
I don't think so.
Who's at fault in this scenario: The person who overheard what I said (the archive) or me for saying it in a non-secure manner (the sender)?
Is there any legal method that I can use to compel a person to forget what they overheard me say?
For the author's rights side to it: I answer an email (and happen to quote just the relevant parts of other emails) to a public mailinglist with a public archive.
I don't think that the archive's admin or anyone else should have the right (let alone the duty) to edit or change my email in there - or even worse: remove it completely.
I disagree.
I believe that the admins / owners of the archive have the right to remove something from the archive (or prevent it from going into the archive in the first place).
I don't believe that admins / owners have the general right to modify what was said.
I do believe that the admins / owners have the right to modify what was said in very specific cases, like REDACTING something. As long as they do so in a manner that is clearly identifiable that something was REDACTED.
After all, it is their system, they administer / own it and can do what ever they want to with it.
They should go out of their way to not misrepresent what you said / did.
They could also claim that your message was modified before it got to them.
Enter rabbit hole.
PS: The whole "right to be forgotten" idea is absurd per se - think about private archives (and I don't think about 3-letter organizations only). Can't we define the public archive to be an necessary and important part of a public mailinglist and be done with it?! For almost everyone else, some "important reason" is good enough too.
I feel like the idea that you can compel someone to forget something is absurd.
I think you can compel businesses to no longer use your contact information. — Which is my naive understanding of part of what the spirit of GDPR is.
I can see a scenario where a company completely removes any and all traces of someone, then buys sales leads which contain said person, and ultimately contact said person again. — Is the company in violation of GDPR? They did (and can prove *) that they removed the person's contact information and thus forgot about them.
Or should the company have retained just enough information to know that they should not contact the person again? I.e. a black list.
(* Don't talk to me about proving the negative. Assume a 3rd party oversight of some sort.)
-- Grant. . . . unix || die
On Mon, 2018-05-14 at 16:54 -0600, Grant Taylor via Mailman-Users wrote: [...]
On 05/14/2018 04:11 PM, Bernd Petrovitsch wrote:
Seriously, these folks don't know what they imply.
Nope. Politicians (almost) never fully understand what's going on.
FWIW and IMHO, I think we are in violent agreement here.
[...]
Who's at fault in this scenario: The person who overheard what I said (the archive) or me for saying it in a non-secure manner (the sender)?
In the old-school life: the sender (because s/he said it on her/his free will) - I hope;-). But the person who overheard it may tell the story to a third person. And it's just/only hear-say - even if it's actually 100% correct (which it is almost never ever the case). And there starts actually the real "forgetting" or "doubts" ...
But in a "everything is written" world, that is massively different: In the old-school world, a "written proof" had a quite large value because it wasn't trivial to have such a thing. Nowadays - with almost every communication over the Internet - it's the normal, that there is a "written proof" aka recorded/logged/whatever.
I'm not diving into differences of "how some judge may value some so- called proof" in some given (somewhat Western) country, but most people
- even in Spring 2018 - don't realize, what's really going on and try to get back the world from the 1960s (or so;-) - well, "thinking before talking" was always a hard job;-)
Is there any legal method that I can use to compel a person to forget=20 what they overheard me say?
A court order may "force" you to not tell it to anyone but it can't make you forget it (or write it down and hide it somewhere safe).
So in general: No. And that's exactly the problem with the "right to be forgotten".
For the author's rights side to it: I answer an email (and happen to quote just the relevant parts of other emails) to a public mailinglist with a public archive.
I don't think that the archive's admin or anyone else should have the right (let alone the duty) to edit or change my email in there
- or even worse: remove it completely.
I disagree.
I believe that the admins / owners of the archive have the right to remove something from the archive (or prevent it from going into the archive in the first place).
Of course. But only for (somewhat obvious) very good (including legal) reason like really hard law issues like - at least in .at and .de - Nazi stuff and/or (everywhere I hope) certain forms of pr0n.
But for some claims of "please remove my email address?"? If that email address can be found (via Google) on hundreds of sites, the removal of one instance doesn't change anything. Ooops, and a chicken-egg problem ....
I don't believe that admins / owners have the general right to modify what was said.
ACK.
I do believe that the admins / owners have the right to modify what was said in very specific cases, like REDACTING something. As long as they
That question should be answered by some copyright/authors right lawyer.
do so in a manner that is clearly identifiable that something was REDACTED.
ACK.
After all, it is their system, they administer / own it and can do what ever they want to with it.
Yes, and everyone writes that in the mailinglists charta (including that all mails go into a public archive, are never edited, censored, deleted, etc.). Just from that point of view, everyone sending mails to the mailinglist has implicitly agreed to the rules including the publication in a Google-indexed archive.
BTW: I cannot do *everything* I want with it because I cannot choose to plain simply ignore modification requests from a court.
They should go out of their way to not misrepresent what you said / did.
They could also claim that your message was modified before it got to them.
Everyone can claim a lot of things - the hard question is how to proove it;-)
PS: The whole "right to be forgotten" idea is absurd per se - think about private archives (and I don't think about 3-letter organizations only). Can't we define the public archive to be an necessary and important part of a public mailinglist and be done with it?! For almost everyone else some "important reason" is good enough too.
I feel like the idea that you can compel someone to forget something is absurd.
I think you can compel businesses to no longer use your contact information.
Any serious business won't send me any "newsletters" if I request that without any legal backing (if only that I continue to buy from it in the future and don't tell anyone that they ignore such simple things - and because it's "just the right thing to do"(TM)).
Which is my naive understanding of part of what the spirit of GDPR is.
Yup, but there are other companies or folks using selling addresses and other personal data (if only for "scientific purposes"[0]).
I can see a scenario where a company completely removes any and all traces of someone, then buys sales leads which contain said person,
Selling and buying "sales leads" (which are actually contact addresses at best) or personal data (as covered by the spirit of the GDPR) as such should be forbidden - that would solve more problems and is easier to enforce). ATM the companies are free to do (almost - also depending on the local jurisdiction) anything with personal data and the effort to handle misuse of it is shifted to the private person. It should be the other way around ....
and ultimately contact said person again.
Is the company in violation of GDPR? They did (and can prove *) that
No.
they removed the person's contact information and thus forgot about them.
Or should the company have retained just enough information to know that they should not contact the person again? I.e. a black list.
Yeah, that's an interesting issue (which happen to apply to the next club with normal member management): A member leaves (for whatever reason) and - to minimize the data - expects that all data about him/her is (really) deleted. But if the same person comes back two years later, doesn't the club (or company) have the right to *know* that that person was already a member (and in which years)? And if a member is expelled, the club surely wants' to remember that.
Of course, that completely invalidates any "request on forgetting" per se (and reduces it to "act like you don't know it").
A completely other approach (and solution;-) to "mailinglist archive and the GDPR": *Is* an automatically generated mailinglist archive in HTML actually subject to the GDPR? It's not that is really structured and/or organized like e.g. some SQL- database.
MfG, Bernd (IANAL etc.)
[0]: Killing whales is only allowed for scientific purposes since >30 years IIRC. Did that really change anything for the whales?
Bernd Petrovitsch Email : bernd@petrovitsch.priv.at LUGA : http://www.luga.at
On 05/17/2018 02:56 AM, Bernd Petrovitsch wrote:
FWIW and IMHO, I think we are in violent agreement here.
:-)
In the old-school life: the sender (because s/he said it on her/his free will) - I hope;-). But the person who overheard it may tell the story to a third person. And it's just/only hear-say - even if it's actually 100% correct (which it is almost never ever the case). And there starts actually the real "forgetting" or "doubts" ...
I agree that fan-out can be a problem. IMHO the root cause is the person that said it, the sender.
But in a "everything is written" world, that is massively different: In the old-school world, a "written proof" had a quite large value because it wasn't trivial to have such a thing. Nowadays - with almost every communication over the Internet - it's the normal, that there is a "written proof" aka recorded/logged/whatever.
That's an interesting point, but I'm not seeing who's at fault, the person who overheard what I said (the archive) or me for saying it in a non-secure manner (the sender)?
I'm not diving into differences of "how some judge may value some so- called proof" in some given (somewhat Western) country, but most people - even in Spring 2018 - don't realize, what's really going on and try to get back the world from the 1960s (or so;-) - well, "thinking before talking" was always a hard job;-)
True.
A court order may "force" you to not tell it to anyone but it can't make you forget it (or write it down and hide it somewhere safe).
Where force = order under some form of penalty, sure.
So in general: No. And that's exactly the problem with the "right to be forgotten".
:-)
Good ideas usually start to have problems when they are taken too far.
Of course. But only for (somewhat obvious) very good (including legal) reason like really hard law issues like - at least in .at and .de - Nazi stuff and/or (everywhere I hope) certain forms of pr0n.
Even with those issues, the court can only order you, under some penalty, to not do something. They still can't cause you to unsee or forget something.
At least I'm not aware of any such technology yet. (My ignorance of such technology does not preclude it from existing.)
But for some claims of "please remove my email address?"? If that email address can be found (via Google) on hundreds of sites, the removal of one instance doesn't change anything. Ooops, and a chicken-egg problem ....
I think it does.
IMHO it's the issue of multiple people doing the same wrong thing does not make the thing in question correct.
Case and point, is it wrong to ask someone specific to stop spamming me when considering that multiple other people could be spamming me?
Or, more along the lines of your example, saluting in a Nazi-esq manner? (I'm not saying I agree with anything there in, I'm just using it as an example.)
That question should be answered by some copyright/authors right lawyer.
Hum.
I would be interested in what their take is.
I suspect it's going to come down to misrepresentation. Either trying to falsely claim credit for someone else's work, or trying to attribute something to someone who didn't say it.
Short of significant persuation to the contrary, I'm going to continue to believe that admins / owners of system have the right to modify what was said in very specific cases when it comes to what enters / passes through / is stored on their systems. IMHO this MUST be done in a manner that makes it clear that this was done.
Yes, and everyone writes that in the mailinglists charta (including that all mails go into a public archive, are never edited, censored, deleted, etc.). Just from that point of view, everyone sending mails to the mailinglist has implicitly agreed to the rules including the publication in a Google-indexed archive.
I have some issues with that.
- Corporate policy, regional laws, technical capabilities, etc. can conflict.
- Agreeing to a E.U.L.A. does not mean that you actually understand it. (I'm hearing where this is being starting to be challenged in courts.)
- Index ability is independent of publicity.
BTW: I cannot do everything I want with it because I cannot choose to plain simply ignore modification requests from a court.
Hence regional laws above.
Everyone can claim a lot of things - the hard question is how to proove it;-)
Yep.
Any serious business won't send me any "newsletters" if I request that without any legal backing (if only that I continue to buy from it in the future and don't tell anyone that they ignore such simple things - and because it's "just the right thing to do"(TM)).
Sadly, I've seen legitimate businesses fail and do exactly that. Use contact details specifically for the contracted service inappropriately for marketing reasons.
Yup, but there are other companies or folks using selling addresses and other personal data (if only for "scientific purposes"[0]).
I feel like those companies should be required to collect the data from somewhere other than what was used explicitly for contracted business.
Much like how HIPAA affords us the restriction to say that the information can only be used for healthcare treatment, and the express process associated there in (billing, insurance, etc.).
This does not extend to marketing or sales as that's not expressly healthcare / treatment.
Selling and buying "sales leads" (which are actually contact addresses at best) or personal data (as covered by the spirit of the GDPR) as such should be forbidden that would solve more problems and is easier to enforce).
I'm going to disagree with you.
I've been around all sorts of people that won't give you their password if you ask them. But if you offer to give them an ice cream cone to buy their password, they will happily trade with you.
The point being, I think there is a valid business model to legitimate collect information under pretense that it will be provided (read: sold) to marketers.
As long as that's clearly indicated up front, and I'm compensated (for my eventual hassle), I might consider doing so. Especially if I have an easy way to tell the people that contact me in the future to bugger off. Who knows, I might actually find something useful in the noise.
ATM the companies are free to do (almost - also depending on the local jurisdiction) anything with personal data and the effort to handle misuse of it is shifted to the private person. It should be the other way around ....
Agreed.
I should be able to earmark that my contact information can ONLY be used for official business transactions and NOT for anything outside said explicit business transaction.
IMHO this should be something like a bit in the database that indicates if the info is available for other uses (read: marketing). Perhaps it should be express contractual uses, general business uses, business partner uses, and general.
No.
:-)
Yeah, that's an interesting issue (which happen to apply to the next club with normal member management): A member leaves (for whatever reason) and - to minimize the data - expects that all data about him/her is (really) deleted.
IMHO, expecting that it is deleted is asking too much in this day and age. Expecting to not be contacted again might be too much.
I think that depends on the terms of the separation. I.e. non-renewing a magazine subscription would likely be okay to offer renewal discounts in 3 / 6 / 9 / 12 / 18 months. Conversely, asking a former member who has been forcibly excommunicated (read: voted out by other members) for a donation during the next fund raiser is probably a bad idea.
But if the same person comes back two years later, doesn't the club (or company) have the right to know that that person was already a member (and in which years)? And if a member is expelled, the club surely wants' to remember that.
I think that the company has the right to know that information.
Note: Knowing that does not translate to using said information for anything outside of the express business relationship.
I seem to keep coming back to the express business relationship.
Of course, that completely invalidates any "request on forgetting" per se (and reduces it to "act like you don't know it").
I think the spirit of requesting to be forgotten really translates to requesting to not be contacted in the future. At least for most (but not all) situations.
A completely other approach (and solution;-) to "mailinglist archive and the GDPR": Is an automatically generated mailinglist archive in HTML actually subject to the GDPR? It's not that is really structured and/or organized like e.g. some SQL- database.
I think that any data collection / aggregation is likely going to be subject to GDPR, for better or worse, in some way.
I also feel like the structure of the data, or lack there of, is somewhat immaterial. Especially in this day and age where people are touting storing data in unstructured manner. Plus, extracting email addresses (and associated names) from a mail archive, HTML or not, is relatively easy. ;-)
-- Grant. . . . unix || die
Bernd Petrovitsch [bernd@petrovitsch.priv.at] wrote:
On Mon, 2018-05-14 at 12:33 +0000, Andrew Hodgson wrote: [...]
These are just rough notes:
- Archive purge requests. We have discussed the same items as on the list to date. I am looking at doing a simple grep for the relevant person's details and changing that. The main reason for doing this is that if we just remove the author's messages they will be in a thread of other messages and our users typically don't remove quoted material. Current advice from the GDPR people is we may have to delete the whole thread. Still under discussion, this is also
While at it, why not delete the entire archive just to be sure? SCNR
That is something we haven't ruled out just yet!
And to be honest: If person X fullquotes and the email ends in an archive, who's fault is it?
The last archive removal request I had a few weeks ago stemmed from one of the subscribers posting a private message about an event and it had the original poster's mobile number in it as well as contact details for the event. There was a large thread about this event, and everyone used top posting. The original author contacted us after having been informed they found the event invitation from our website, and were not happy. What do I redact or remove in this instance?
- The whole thread;
- Personal details about the original poster and the event who had not consented to having their email posted to the mailing list;
- Anything else?
In the end I removed the phone numbers, her personal address and the Eventbright links from *all* messages, including some messages from other people where they had re-echoed the Eventbright links as part of their conversation to help other people. She wasn't very happy, but worse is the person who forwarded it to the mailing list refused to understand what they had really done and believed they had the right to send the post anywhere as they believed it was in the public domain.
Just an example of the type of stuff that I may get asked to remove in future.
Andrew.
On 05/15/2018 03:08 AM, Andrew Hodgson wrote:
What do I redact or remove in this instance?
- Personal details about the original poster and the event who had not consented to having their email posted to the mailing list;
I would likely have (presuming sufficient motivation):
- Get mailman into a state that I can safely modify the archive.
- Run a script (likely sed) to REDACT the contents. sed -i$ticketID 's/phone number/REDACTED/g;s/Eventbright Link/REDACTED/g;#etc'
- Restarted Mailman and possibly web server serving the archive. (Or otherwise flushed caches.)
I quite like "REDACTED" as it shows that there was something, and that it was removed, but it does not show what that something was.
In the end I removed the phone numbers, her personal address and the Eventbright links from all messages, including some messages from other people where they had re-echoed the Eventbright links as part of their conversation to help other people.
Fair enough.
She wasn't very happy,
I doubt there was much more that you could have done. She's free to be upset. But she shouldn't be upset with you. You did her a favor that I don't think you were strictly compelled to do.
but worse is the person who forwarded it to the mailing list refused to understand what they had really done and believed they had the right to send the post anywhere as they believed it was in the public domain.
*sigh*
I don't know what to say there.
I feel like that's between her and the event owner / organizer.
Just an example of the type of stuff that I may get asked to remove in future.
IMHO that is not unexpected, if not somewhat typical.
-- Grant. . . . unix || die
Following with interest, although my mailmans are on Dreamhost and I don't have root access only admin.
RBTF concerns aside, I am wondering how to do a renewed opt-in, similar to what I see Mailchimp currently running. Any ideas?
--
Joly MacFie 218 565 9365 Skype:punkcast
On 5/15/18 11:51 AM, Grant Taylor via Mailman-Users wrote:
I would likely have (presuming sufficient motivation):
1) Get mailman into a state that I can safely modify the archive. 2) Run a script (likely sed) to REDACT the contents. sed -i$ticketID 's/phone number/REDACTED/g;s/Eventbright Link/REDACTED/g;#etc' 3) Restarted Mailman and possibly web server serving the archive. (Or otherwise flushed caches.)
I quite like "REDACTED" as it shows that there was something, and that it was removed, but it does not show what that something was.
I've been silent in this thread because it doesn't interest me that much, but I want to point out that redacting a pipermail archive is more difficult than it would first appear.
You not only have to redact the HTML pages, but also the .txt and .txt.gz files, and if there is sensitive information in the index pages (subject and sender info), you also have to redact that in the pipermail database. See the script at <https://www.msapiro.net/scripts/hdfix> and read its docstring for an idea.
Finally, you have to redact the cumulative LIST.mbox/LIST.mbox and maybe the attachments directory.
Actually, the easiest way is to just redact the cumulative LIST.mbox/LIST.mbox file and rebuild the archive with 'bin/arch --wipe' but that can have undesired side effects.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Duly noted.
On 05/15/2018 07:04 PM, Mark Sapiro wrote:
Actually, the easiest way is to just redact the cumulative LIST.mbox/LIST.mbox file and rebuild the archive with 'bin/arch --wipe' but that can have undesired side effects.
Doesn't that run the risk of renumbering messages, thus breaking existing links to messages? Or at least disassociating them such that they link to the wrong message?
-- Grant. . . . unix || die
On 5/15/18 6:50 PM, Grant Taylor via Mailman-Users wrote:
On 05/15/2018 07:04 PM, Mark Sapiro wrote:
Actually, the easiest way is to just redact the cumulative LIST.mbox/LIST.mbox file and rebuild the archive with 'bin/arch --wipe' but that can have undesired side effects.
Doesn't that run the risk of renumbering messages, thus breaking existing links to messages? Or at least disassociating them such that they link to the wrong message?
That's one of the "undesired side effects" although if the list is less than 10 years old and you don't and never have edited the mbox with an MUA that can reorder messages and you just redact text and don't delete messages, that risk is small.
Other issues can arise if the list's scrub_nondigest setting is No now but has been Yes at some time in the past, scrubbed attachments from the Yes period will be lost.
Also, if you have a list search, e.g. htdig integration, that can order hits by file system time stamp, this may be an issue because all the timestamps become the current time, although the same issue occurs when editing the HTML files directly. There is a script to fix that at <https://www.msapiro.net/scripts/update_archive_mtime>.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (9)
-
Andrew Hodgson -
Bernd Petrovitsch -
Dimitri Maziuk -
Grant Taylor -
Joly MacFie -
Julian H. Stacey -
Mark Sapiro -
Stephen J. Turnbull -
Ángel