[Python-ideas] A service to crawl +1s and URLs out of mailman archives
Steven D'Aprano
steve at pearwood.info
Mon Dec 1 19:51:58 CET 2014
On Mon, Dec 01, 2014 at 09:52:41AM -0600, Wes Turner wrote:
> In context to building a PEP or similar, I don't know how many times I've
> trawled looking for:
>
> * Docs links
> * Source links
> * Patch links
> * THREAD POST LINKS
> * Consensus
>
> A tool to crawl structued and natural language data from the forums could
> be very useful for preparing PEPs.
Yes it would be. Do you have any idea how to write such a tool?
Do you think suh a tool would be of enough interest to enough people
that it should be distributed in the Python standard library?
I think that this would make a great project on PyPI, especially since
it make take a long, long time for it to develop enough intelligence to
be able to do the job you're suggesting. Finding links to documentation
and source code is fairly straightforward, but building in the
intelligence to find "consensus" is a non-trivial application of natural
language processing and an impressive feat of artificial intelligence.
It certainly doesn't sound like something that somebody could write over
a weekend and add to the 3.5 standard library, it's more like an
on-going project that will see continual development for many years.
--
Steven
More information about the Python-ideas
mailing list