A service to crawl +1s and URLs out of mailman archives

In context to building a PEP or similar, I don't know how many times I've trawled looking for: * Docs links * Source links * Patch links * THREAD POST LINKS * Consensus A tool to crawl structued and natural language data from the forums could be very useful for preparing PEPs.

On Mon, Dec 01, 2014 at 09:52:41AM -0600, Wes Turner wrote:
Yes it would be. Do you have any idea how to write such a tool? Do you think suh a tool would be of enough interest to enough people that it should be distributed in the Python standard library? I think that this would make a great project on PyPI, especially since it make take a long, long time for it to develop enough intelligence to be able to do the job you're suggesting. Finding links to documentation and source code is fairly straightforward, but building in the intelligence to find "consensus" is a non-trivial application of natural language processing and an impressive feat of artificial intelligence. It certainly doesn't sound like something that somebody could write over a weekend and add to the 3.5 standard library, it's more like an on-going project that will see continual development for many years. -- Steven

On Mon, Dec 1, 2014 at 12:51 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Such a module would undoubtedly rely upon external libraries like: requests, celery, beautifulsoup, and NLTK. And whatever is necessary to poll mailman without asyncio (e.g. channels, websockets). This is sort of in scope for python-ideas, as a general observation that *linked* development artifacts are traceable, reproducible, and task focused on: docs, code, and tests (the build).
https://github.com/wrdrd/docs/blob/master/wrdrd/tools/crawl.py Issues: * Too many HTTP requests * Inefficient * A real live queue could be helpful Thank you for your feedback!

On Mon, Dec 1, 2014 at 3:36 PM, Barry Warsaw <barry@python.org> wrote:
Good call, thanks. Are the Python mailman instances upgraded to Mailman 3, with the Django GUI? Is there any way to recognize a Lightweight Markup Language doctype declaration, in email? ```restructuredtext ... ``` ```python #!/usr/bin/env python ```

On 2 December 2014 at 20:53, Wes Turner <wes.turner@gmail.com> wrote:
We'd like them to be, but there's no ETA yet (for either the final MM3 release or the subsequent upgrade of the python.org mailing lists). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Dec 02, 2014, at 04:53 AM, Wes Turner wrote:
Are the Python mailman instances upgraded to Mailman 3, with the Django GUI?
No, but we know there are people using it in production. I expect we'll get another beta before the end of the year and then I think it's time to start planning an experimental deployment on pdo. We have a few experimental lists on mpo that we can convert and start playing with.
Is there any way to recognize a Lightweight Markup Language doctype declaration, in email?
Maybe a Content-Type? But can you elaborate one what you're thinking about? Cheers, -Barry

Is there any way to recognize a Lightweight Markup Language doctype declaration, in email?
Maybe a Content-Type? But can you elaborate one what you're thinking about?
```restructuredtext Often, mailing list and issue text gets *worked into* docs. ``` ReST in email may ultimately be a bit noisy and ultimately an unproductive feature. On Tue, Dec 2, 2014 at 1:32 PM, Barry Warsaw <barry@python.org> wrote:

cc'd here from https://westurner.github.io/wiki/ideas#open-source-mailing-list-extractor Open Source Mailing List Extractor ---------------------------------- Use cases: * https://mail.python.org/pipermail/python-ideas/2014-December/030228.html * incentivization of actionable crossreferences * PEP research * "is this actionable?" * "are we voting?" * Crawl/parse/extract links and +1 from given thread(s) * Detect a few standard link types: * Issue * Src * Doc * Ref * x-link * +1s with expandable snippets? (like ``grep -C``) * There could be configurable per-list link heuristics: * http[s] * Issue: https://bugs.python.org/issue(\d+) * Src: https://hg.python.org/<repo>/<path> * Src: https://github.com/<org>/<project>/<path> * Src: https://bitbucket.org/<org>/<project>/<path> * Patch/Attachment: http[s]://bugs.python.org/(file[\d]+)/ <filename(.diff)> * Doc: https://docs.python.org/<ver>/<path> * Wiki: https://wiki.python.org/moin/<path> * Homepage: https://www.python.org/<path> * PyPI pkg: https://pypi.python.org/pypi/<path> * Warehouse pkg: https://warehouse.python.org/project/<path> * Wikipedia: https://[lang].wikipedia.org/wiki/<page> --> (dbpedia:<page>) * Build: http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.4/builds/77... * ... JSON-LD RDF * This could - most efficiently - be added to mailman (e.g. in Postorious and/or HyperKitty) * http://mailman-bundler.readthedocs.org/en/latest/ * http://pythonhosted.org//mailman/ * https://mail.python.org/mailman/listinfo/mailman-developers ... looking forward Mailman3. On Thu, Dec 18, 2014 at 10:47 AM, Wes Turner <wes.turner@gmail.com> wrote:

On 19 Dec 2014 02:53, "Wes Turner" <wes.turner@gmail.com> wrote:
Indeed - I'm also looking forward to that on Fedora & Red Hat side of things, in addition to for the Python mailing lists :) I was actually thinking about this kind of feature in a HyperKitty context yesterday, after Marc-Andre Lemburg pointed out a new service called Loomio (www.loomio.org). That's essentially a web discussion forum that allows the use of Apache style voting to come to a consensus. It calls the four levels Agree, Abstain, Disagree, Block, and presents a summary of "current position statements" alongside the discussion. With a little bit of lightweight mark up to reduce false positives, I could see such a feature being a valuable addition to HyperKitty. Cheers, Nick.

On Thu, Dec 18, 2014 at 6:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
What a great idea!
I took a course in "Collaboration" offered through a local university in which one of our applied objectives was to collaboratively write a free ebook (now on LuLu) about collaboration; it was a great way to study for the final, in regards to Collaborative Engineering. "6 Patterns of Collaboration": Generate, Reduce, Clarify, Organize, Evaluate, Build Consensus. In band with email and mailing lists would be great. Recently, I made the mistake of trying to approximate the https://github.com/twitter/twitter-text JS code which matches @username and #\w (#RTLunicodew\). Syntax highlighting could be useful. TL;DR I really like the "Agree, Abstain, Disagree, Block" distinction; it would be great to somehow integrate with email, somehow.

TL;DR I really like the "Agree, Abstain, Disagree, Block" distinction; it would be great to somehow integrate with email, somehow.
I can't help but wonder whether this could be implemented by extending OpenAnnotation [1][2][3] (JSON-LD RDF) for **any URI**, like the Hypothesis [4] OpenAnnotation web service and Annotator JS / browser extension. [5] [1] http://openannotation.org/spec/core/ [2] http://openannotation.org/spec/core/specific.html#Selectors [3] http://www.w3.org/annotation/ [4] https://github.com/hypothesis/h (Pyramid) [5] https://github.com/hypothesis/annotator (OKFN) Alas, mailing list posts do not have easily-gettable URIs (like Reddit comment permalinks). On Fri, Dec 19, 2014 at 3:50 AM, Wes Turner <wes.turner@gmail.com> wrote:

On 19 December 2014 at 20:39, Wes Turner <wes.turner@gmail.com> wrote:
We're getting fairly well off-topic for python-ideas now (since anything we do in this area will likely only be by way of upgrading to Mailman 3 and HyperKitty), but I do highly recommend getting in touch with the HyperKitty devs on their mailing list. It takes advantage of the new archiving design in Mailman 3 to add the archive server permalink for the post to the footer before it gets sent to the list members. Cheers, Nick. P.S. Barry's write-up of the Mailman 3 architecture is also well worth a read in general: http://www.aosabook.org/en/mailman.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Dec 19, 2014, at 04:39 AM, Wes Turner wrote:
Alas, mailing list posts do not have easily-gettable URIs (like Reddit comment permalinks).
Actually, we have a proposal for that, and yes it's implemented in MM3: http://wiki.list.org/display/DEV/Stable+URLs Cheers, -Barry

P.S. Barry's write-up of the Mailman 3 architecture is also well worth a read in general: http://www.aosabook.org/en/mailman.html
Thanks! On Fri, Dec 19, 2014 at 10:25 AM, Barry Warsaw <barry@python.org> wrote:
#Stable URL calculation : Subject: An important message Date: Wed, 04 Jul 2007 16:49:58 +0900 Message-ID: <87myycy5eh.fsf@uwakimon.sk.tsukuba.ac.jp> X-Message-ID-Hash: JJIGKPKB6CVDX6B2CUG4IHAJRIQIOUTP List-Archive: http://mail.python.org/archives/mailman-developers Archived-At:
http://mail.python.org/archives/mailman-developers/JJIGKPKB6CVDX6B2CUG4IHAJR...

On Mon, Dec 01, 2014 at 09:52:41AM -0600, Wes Turner wrote:
Yes it would be. Do you have any idea how to write such a tool? Do you think suh a tool would be of enough interest to enough people that it should be distributed in the Python standard library? I think that this would make a great project on PyPI, especially since it make take a long, long time for it to develop enough intelligence to be able to do the job you're suggesting. Finding links to documentation and source code is fairly straightforward, but building in the intelligence to find "consensus" is a non-trivial application of natural language processing and an impressive feat of artificial intelligence. It certainly doesn't sound like something that somebody could write over a weekend and add to the 3.5 standard library, it's more like an on-going project that will see continual development for many years. -- Steven

On Mon, Dec 1, 2014 at 12:51 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Such a module would undoubtedly rely upon external libraries like: requests, celery, beautifulsoup, and NLTK. And whatever is necessary to poll mailman without asyncio (e.g. channels, websockets). This is sort of in scope for python-ideas, as a general observation that *linked* development artifacts are traceable, reproducible, and task focused on: docs, code, and tests (the build).
https://github.com/wrdrd/docs/blob/master/wrdrd/tools/crawl.py Issues: * Too many HTTP requests * Inefficient * A real live queue could be helpful Thank you for your feedback!

On Mon, Dec 1, 2014 at 3:36 PM, Barry Warsaw <barry@python.org> wrote:
Good call, thanks. Are the Python mailman instances upgraded to Mailman 3, with the Django GUI? Is there any way to recognize a Lightweight Markup Language doctype declaration, in email? ```restructuredtext ... ``` ```python #!/usr/bin/env python ```

On 2 December 2014 at 20:53, Wes Turner <wes.turner@gmail.com> wrote:
We'd like them to be, but there's no ETA yet (for either the final MM3 release or the subsequent upgrade of the python.org mailing lists). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Dec 02, 2014, at 04:53 AM, Wes Turner wrote:
Are the Python mailman instances upgraded to Mailman 3, with the Django GUI?
No, but we know there are people using it in production. I expect we'll get another beta before the end of the year and then I think it's time to start planning an experimental deployment on pdo. We have a few experimental lists on mpo that we can convert and start playing with.
Is there any way to recognize a Lightweight Markup Language doctype declaration, in email?
Maybe a Content-Type? But can you elaborate one what you're thinking about? Cheers, -Barry

Is there any way to recognize a Lightweight Markup Language doctype declaration, in email?
Maybe a Content-Type? But can you elaborate one what you're thinking about?
```restructuredtext Often, mailing list and issue text gets *worked into* docs. ``` ReST in email may ultimately be a bit noisy and ultimately an unproductive feature. On Tue, Dec 2, 2014 at 1:32 PM, Barry Warsaw <barry@python.org> wrote:

cc'd here from https://westurner.github.io/wiki/ideas#open-source-mailing-list-extractor Open Source Mailing List Extractor ---------------------------------- Use cases: * https://mail.python.org/pipermail/python-ideas/2014-December/030228.html * incentivization of actionable crossreferences * PEP research * "is this actionable?" * "are we voting?" * Crawl/parse/extract links and +1 from given thread(s) * Detect a few standard link types: * Issue * Src * Doc * Ref * x-link * +1s with expandable snippets? (like ``grep -C``) * There could be configurable per-list link heuristics: * http[s] * Issue: https://bugs.python.org/issue(\d+) * Src: https://hg.python.org/<repo>/<path> * Src: https://github.com/<org>/<project>/<path> * Src: https://bitbucket.org/<org>/<project>/<path> * Patch/Attachment: http[s]://bugs.python.org/(file[\d]+)/ <filename(.diff)> * Doc: https://docs.python.org/<ver>/<path> * Wiki: https://wiki.python.org/moin/<path> * Homepage: https://www.python.org/<path> * PyPI pkg: https://pypi.python.org/pypi/<path> * Warehouse pkg: https://warehouse.python.org/project/<path> * Wikipedia: https://[lang].wikipedia.org/wiki/<page> --> (dbpedia:<page>) * Build: http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.4/builds/77... * ... JSON-LD RDF * This could - most efficiently - be added to mailman (e.g. in Postorious and/or HyperKitty) * http://mailman-bundler.readthedocs.org/en/latest/ * http://pythonhosted.org//mailman/ * https://mail.python.org/mailman/listinfo/mailman-developers ... looking forward Mailman3. On Thu, Dec 18, 2014 at 10:47 AM, Wes Turner <wes.turner@gmail.com> wrote:

On 19 Dec 2014 02:53, "Wes Turner" <wes.turner@gmail.com> wrote:
Indeed - I'm also looking forward to that on Fedora & Red Hat side of things, in addition to for the Python mailing lists :) I was actually thinking about this kind of feature in a HyperKitty context yesterday, after Marc-Andre Lemburg pointed out a new service called Loomio (www.loomio.org). That's essentially a web discussion forum that allows the use of Apache style voting to come to a consensus. It calls the four levels Agree, Abstain, Disagree, Block, and presents a summary of "current position statements" alongside the discussion. With a little bit of lightweight mark up to reduce false positives, I could see such a feature being a valuable addition to HyperKitty. Cheers, Nick.

On Thu, Dec 18, 2014 at 6:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
What a great idea!
I took a course in "Collaboration" offered through a local university in which one of our applied objectives was to collaboratively write a free ebook (now on LuLu) about collaboration; it was a great way to study for the final, in regards to Collaborative Engineering. "6 Patterns of Collaboration": Generate, Reduce, Clarify, Organize, Evaluate, Build Consensus. In band with email and mailing lists would be great. Recently, I made the mistake of trying to approximate the https://github.com/twitter/twitter-text JS code which matches @username and #\w (#RTLunicodew\). Syntax highlighting could be useful. TL;DR I really like the "Agree, Abstain, Disagree, Block" distinction; it would be great to somehow integrate with email, somehow.

TL;DR I really like the "Agree, Abstain, Disagree, Block" distinction; it would be great to somehow integrate with email, somehow.
I can't help but wonder whether this could be implemented by extending OpenAnnotation [1][2][3] (JSON-LD RDF) for **any URI**, like the Hypothesis [4] OpenAnnotation web service and Annotator JS / browser extension. [5] [1] http://openannotation.org/spec/core/ [2] http://openannotation.org/spec/core/specific.html#Selectors [3] http://www.w3.org/annotation/ [4] https://github.com/hypothesis/h (Pyramid) [5] https://github.com/hypothesis/annotator (OKFN) Alas, mailing list posts do not have easily-gettable URIs (like Reddit comment permalinks). On Fri, Dec 19, 2014 at 3:50 AM, Wes Turner <wes.turner@gmail.com> wrote:

On 19 December 2014 at 20:39, Wes Turner <wes.turner@gmail.com> wrote:
We're getting fairly well off-topic for python-ideas now (since anything we do in this area will likely only be by way of upgrading to Mailman 3 and HyperKitty), but I do highly recommend getting in touch with the HyperKitty devs on their mailing list. It takes advantage of the new archiving design in Mailman 3 to add the archive server permalink for the post to the footer before it gets sent to the list members. Cheers, Nick. P.S. Barry's write-up of the Mailman 3 architecture is also well worth a read in general: http://www.aosabook.org/en/mailman.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Dec 19, 2014, at 04:39 AM, Wes Turner wrote:
Alas, mailing list posts do not have easily-gettable URIs (like Reddit comment permalinks).
Actually, we have a proposal for that, and yes it's implemented in MM3: http://wiki.list.org/display/DEV/Stable+URLs Cheers, -Barry

P.S. Barry's write-up of the Mailman 3 architecture is also well worth a read in general: http://www.aosabook.org/en/mailman.html
Thanks! On Fri, Dec 19, 2014 at 10:25 AM, Barry Warsaw <barry@python.org> wrote:
#Stable URL calculation : Subject: An important message Date: Wed, 04 Jul 2007 16:49:58 +0900 Message-ID: <87myycy5eh.fsf@uwakimon.sk.tsukuba.ac.jp> X-Message-ID-Hash: JJIGKPKB6CVDX6B2CUG4IHAJRIQIOUTP List-Archive: http://mail.python.org/archives/mailman-developers Archived-At:
http://mail.python.org/archives/mailman-developers/JJIGKPKB6CVDX6B2CUG4IHAJR...
participants (4)
-
Barry Warsaw
-
Nick Coghlan
-
Steven D'Aprano
-
Wes Turner