Accessing mailing list archives

hi all, I've been trying to figure out how to access the archives programmatically. I'm sure this is easy once you know, but googling various things hasn't worked. What I want to do is graph the number of messages about PEP 572 by time. (or has someone already done that?) I installed GNU Mailman, and downloaded the gzip'ed archives for a number of months and unzipped them, and I suspect that there's some way to get them all into a single database, but it hasn't jumped out at me. If I count the "Message-ID" lines, the "Subject:" lines, and the "\nFrom " lines in one of those text files, I get slightly different numbers for each. Alternatively, they're maybe *already* in a database, and I just need API access to do the querying? Can someone help me out? Bob

hi all, I've been trying to figure out how to access the archives
I installed GNU Mailman, and downloaded the gzip'ed archives for a number of months and unzipped them, and I suspect that there's some way to get
Hi Bob, I wrote a basic script to compute the number of emails per PEP. It requires to download gzipped mbox files from the web page of archives per month, then ungzip them: https://github.com/vstinner/misc/blob/master/python/parse_mailman_mbox_peps.... Results: https://mail.python.org/pipermail/python-committers/2018-April/005310.html Victor Le lundi 30 juillet 2018, Bob Purvy <bpurvy@gmail.com> a écrit : programmatically. I'm sure this is easy once you know, but googling various things hasn't worked. What I want to do is graph the number of messages about PEP 572 by time. (or has someone already done that?) them all into a single database, but it hasn't jumped out at me. If I count the "Message-ID" lines, the "Subject:" lines, and the "\nFrom " lines in one of those text files, I get slightly different numbers for each.
Alternatively, they're maybe already in a database, and I just need API access to do the querying? Can someone help me out? Bob

Would it be possible to normalize by the number of mailing list members and also by "active" members? The latter would be tricky to define. On Mon, Jul 30, 2018 at 3:29 PM Victor Stinner <vstinner@redhat.com> wrote:
Hi Bob,
I wrote a basic script to compute the number of emails per PEP. It requires to download gzipped mbox files from the web page of archives per month, then ungzip them:
https://github.com/vstinner/misc/blob/master/python/parse_mailman_mbox_peps....
Results: https://mail.python.org/pipermail/python-committers/2018-April/005310.html
Victor
hi all, I've been trying to figure out how to access the archives
Le lundi 30 juillet 2018, Bob Purvy <bpurvy@gmail.com> a écrit : programmatically. I'm sure this is easy once you know, but googling various things hasn't worked. What I want to do is graph the number of messages about PEP 572 by time. (or has someone already done that?)
I installed GNU Mailman, and downloaded the gzip'ed archives for a number of months and unzipped them, and I suspect that there's some way to get them all into a single database, but it hasn't jumped out at me. If I count the "Message-ID" lines, the "Subject:" lines, and the "\nFrom " lines in one of those text files, I get slightly different numbers for each. Alternatively, they're maybe already in a database, and I just need API access to do the querying? Can someone help me out? Bob _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mike%40selik.org

Feel free to modify the script to make your own statistics ;-) Victor 2018-08-01 0:57 GMT+02:00 Michael Selik <mike@selik.org>:
Would it be possible to normalize by the number of mailing list members and also by "active" members? The latter would be tricky to define.
On Mon, Jul 30, 2018 at 3:29 PM Victor Stinner <vstinner@redhat.com> wrote:
Hi Bob,
I wrote a basic script to compute the number of emails per PEP. It requires to download gzipped mbox files from the web page of archives per month, then ungzip them:
https://github.com/vstinner/misc/blob/master/python/parse_mailman_mbox_peps....
Results: https://mail.python.org/pipermail/python-committers/2018-April/005310.html
Victor
Le lundi 30 juillet 2018, Bob Purvy <bpurvy@gmail.com> a écrit :
hi all, I've been trying to figure out how to access the archives programmatically. I'm sure this is easy once you know, but googling various things hasn't worked. What I want to do is graph the number of messages about PEP 572 by time. (or has someone already done that?) I installed GNU Mailman, and downloaded the gzip'ed archives for a number of months and unzipped them, and I suspect that there's some way to get them all into a single database, but it hasn't jumped out at me. If I count the "Message-ID" lines, the "Subject:" lines, and the "\nFrom " lines in one of those text files, I get slightly different numbers for each. Alternatively, they're maybe already in a database, and I just need API access to do the querying? Can someone help me out? Bob _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mike%40selik.org

On 30Jul2018 13:40, Bob Purvy <bpurvy@gmail.com> wrote:
I've been trying to figure out how to access the archives programmatically. I'm sure this is easy once you know, but googling various things hasn't worked. What I want to do is graph the number of messages about PEP 572 by time. (or has someone already done that?)
I installed GNU Mailman, and downloaded the gzip'ed archives for a number of months and unzipped them, and I suspect that there's some way to get them all into a single database, but it hasn't jumped out at me. If I count the "Message-ID" lines, the "Subject:" lines, and the "\nFrom " lines in one of those text files, I get slightly different numbers for each.
Alternatively, they're maybe *already* in a database, and I just need API access to do the querying? Can someone help me out?
Like Victor, I download mailing list archives. Between pulling them in and also subscribing, ideally I get a complete history in my "python" mail folder. Likewise for other lists. The mailman archives are UNIX mbox files, compressed, with a bit of header munging (to make address harvesting harder). You can concatenate them and uncompress and reverse the munging like this: cat *.gz | gunzip | fix-mail-dates --mbox | un-at- where fix-mail-dates is here: https://bitbucket.org/cameron_simpson/css/src/tip/bin/fix-mail-dates and un-at- is here: https://bitbucket.org/cameron_simpson/css/src/tip/bin/un-at- and the output is a nice UNIX mbox file. You can load that into most mail readers or parse it with Python's email modules (in the stdlib). It should be easy enough to scan such a thing and count header contents etc. Ignore the "From " line content, prefer the "From:" header. (Separate messages on "From " of course, just don't grab email addresses from it.) Cheers, Cameron Simpson <cs@cskk.id.au>
participants (4)
-
Bob Purvy
-
Cameron Simpson
-
Michael Selik
-
Victor Stinner