[Mailman-Users] GDPR

Mark Sapiro mark at msapiro.net
Tue May 15 21:04:06 EDT 2018


On 5/15/18 11:51 AM, Grant Taylor via Mailman-Users wrote:
> 
> I would likely have (presuming sufficient motivation):
> 
> 1)  Get mailman into a state that I can safely modify the archive.
> 2)  Run a script (likely sed) to REDACT the contents.
>       sed -i$ticketID 's/phone number/REDACTED/g;s/Eventbright
> Link/REDACTED/g;#etc'
> 3)  Restarted Mailman and possibly web server serving the archive.
>     (Or otherwise flushed caches.)
> 
> I quite like "REDACTED" as it shows that there was something, and that
> it was removed, but it does not show what that something was.


I've been silent in this thread because it doesn't interest me that
much, but I want to point out that redacting a pipermail archive is more
difficult than it would first appear.

You not only have to redact the HTML pages, but also the .txt and
.txt.gz files, and if there is sensitive information in the index pages
(subject and sender info), you also have to redact that in the pipermail
database. See the script at <https://www.msapiro.net/scripts/hdfix> and
read its docstring for an idea.

Finally, you have to redact the cumulative LIST.mbox/LIST.mbox and maybe
the attachments directory.

Actually, the easiest way is to just redact the cumulative
LIST.mbox/LIST.mbox file and rebuild the archive with 'bin/arch --wipe'
but that can have undesired side effects.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list