[core-workflow] web API to get a list of all module in stdlib

anatoly techtonik techtonik at gmail.com
Mon Mar 23 13:06:23 CET 2015


Hi,

I am doing an exercise as a part of agile ux data mining
team, and I need to get a list of Python modules:

https://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules

But this gives only the modules that were compiled into
specific interpreter, and I need a list of modules that are
de-facto included in stdlib standard.

I also need this for all Python versions, and be able to
fetch it as csv, json or html table format over webm so
that result of my work could be validated and experiment
repeated as necessary.


I see the data as the necessary step to organize a work
around "externally evolving standard library", so a way
to query it should be somewhat sustainable and obvious.

It might be possible to generate something from docs, like:

https://docs.python.org/2.7.2/dataset/modules.json

This way you get static information without ability to
version or refresh the info (still good to have anyway to
compare docs and other sources).

Or it may be a dedicated URL:

https://api.python.org/2.7.2/stdlib/modules/

The result is HTML be default.
?format=csv   - result is csv
?format=yaml

I need in particular:
 - module name
 - files that comprise module sources
 - os supported

So, basically I need an official support for this:

https://bitbucket.org/techtonik/python-stdlib/src/092af75da07cb264070115fb9a970e27b1e57f72/stdlib.json?at=default

Because I don't have means to maintain this myself and
feel tired trying to think about how it can be maintained
from outside.

If I have this mapping, I can make a diagram how many
patches per module are sitting there on the tracker, and
it may open a can of worms for many other fishy stats
that will be attractive for people to work on.

Actually, the code that sorts patches by modules is
already there in that repository. It is also unlicensed
to get it free from restrictions placed by copyright law
over distributed development, so it doesn't require me
or you to sign CLA to further develop it.


So, where is the first class info about the module structure of stdlib?

Where this info should be fetched from if accessed automatically from the web?

How it should be kept up to date for all Python versions?
-- 
anatoly t.


More information about the core-workflow mailing list