
Hi everyone,
I'm new to this list but I've been reading some threads in the archive.
Around february, an idea about indexing modules from PyPI packages was brought up. I've been working on something similar for quite a while.
PyPIContents is an index of PyPI packages that lists its modules and command line scripts in JSON format, like this:
[ ...
"1337": { "cmdline": [], "modules": [ "1337", "1337.1337" ], "version": "1.0.0" },
...
]
You can check it out here:
https://github.com/LuisAlejandro/pypicontents
And some use cases:
https://github.com/LuisAlejandro/pypicontents#use-cases
The actual index lives here, its around 60MB:
https://raw.githubusercontent.com/LuisAlejandro/pypicontents/contents/pypi.j...
Is updated daily with the help of Travis:
https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml
Anyway, I hope is useful and I'll be around for any comments or questions.
Cheers!
Luis Alejandro Martínez Faneyth Blog: http://huntingbears.com.ve Github: http://github.com/LuisAlejandro Twitter: http://twitter.com/LuisAlejandro
CODE IS POETRY

Hi Luis,
Awesome, thanks for this :-). It was me posting before about indexing PyPI. I'm intrigued: how do you keep it up to date using Travis? When I looked into this, I was pretty sure you need to download every package to index it. Do you have some way to only download the new releases? Or is Travis able to download every package every day? Or have you found another way round it? Does the index only include the latest version of each package, or does it also include older versions? The wifi on the train I'm on at the moment isn't fast enough to download 60 MB to find out. ;-) Does your indexing tool prefer to use wheels or sdists? Is it capable of using either for packages which don't have both available? Do you do anything to cope with modules which may be included for one platform but not another? I'm excited to see someone actually doing this!
Thomas
On Sat, May 20, 2017, at 03:01 AM, Luis Alejandro Martínez Faneyth wrote:> Hi everyone,
I'm new to this list but I've been reading some threads in the archive.> Around february, an idea about indexing modules from PyPI packages was brought up. I've been working on something similar for quite a while.> PyPIContents is an index of PyPI packages that lists its modules and command line scripts in JSON format, like this:>
[ ... "1337": { "cmdline": [], "modules": [ "1337", "1337.1337" ], "version": "1.0.0" }, ... ]
You can check it out here:
https://github.com/LuisAlejandro/pypicontents
And some use cases:
https://github.com/LuisAlejandro/pypicontents#use-cases
The actual index lives here, its around 60MB:
https://raw.githubusercontent.com/LuisAlejandro/pypicontents/contents/pypi.j... Is updated daily with the help of Travis:
https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml%3E Anyway, I hope is useful and I'll be around for any comments or questions.> Cheers!
Luis Alejandro Martínez Faneyth Blog: http://huntingbears.com.ve Github: http://github.com/LuisAlejandro Twitter: http://twitter.com/LuisAlejandro
CODE IS POETRY
Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Hi Thomas,
2017-05-20 13:23 GMT-04:00 Thomas Kluyver thomas@kluyver.me.uk:
Hi Luis,
Awesome, thanks for this :-). It was me posting before about indexing PyPI.
I'm intrigued: how do you keep it up to date using Travis? When I looked into this, I was pretty sure you need to download every package to index it. Do you have some way to only download the new releases? Or is Travis able to download every package every day? Or have you found another way round it?
I divided the index processing alphabetically, so that each letter is processed in a separate travis job. I also placed memory and time limits to avoid abusing Travis. The first run it has to download each package until it reaches the maximum time limit for each job, which is 40min. The next time, the script will only process packages that have been updated since the last run.
Does the index only include the latest version of each package, or does it also include older versions? The wifi on the train I'm on at the moment isn't fast enough to download 60 MB to find out. ;-)
It only includes the current versions.
Does your indexing tool prefer to use wheels or sdists? Is it capable of using either for packages which don't have both available? Do you do anything to cope with modules which may be included for one platform but not another?
It supports ['.whl', '.egg', '.zip', '.tgz', '.tar.gz', '.tar.bz2'] formats, and it extracts the data using any available.
I wasn't aware of the fact that some modules may be on one platform and not in another. I guess there's room for improvement.
I'm excited to see someone actually doing this!
Thank you. I made this because I wanted to have an app that guessed python dependencies from code by scaning module imports and then looking up the Index. That app is called Pip Sala Bim and you can check it out here:
https://github.com/LuisAlejandro/pipsalabim
Thomas
On Sat, May 20, 2017, at 03:01 AM, Luis Alejandro Martínez Faneyth wrote:
Hi everyone,
I'm new to this list but I've been reading some threads in the archive.
Around february, an idea about indexing modules from PyPI packages was brought up. I've been working on something similar for quite a while.
PyPIContents is an index of PyPI packages that lists its modules and command line scripts in JSON format, like this:
[ ...
"1337": { "cmdline": [], "modules": [ "1337", "1337.1337" ], "version": "1.0.0" },
...
]
You can check it out here:
https://github.com/LuisAlejandro/pypicontents
And some use cases:
https://github.com/LuisAlejandro/pypicontents#use-cases
The actual index lives here, its around 60MB:
https://raw.githubusercontent.com/LuisAlejandro/pypicontents /contents/pypi.json
Is updated daily with the help of Travis:
https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml
Anyway, I hope is useful and I'll be around for any comments or questions.
Cheers!
Luis Alejandro Martínez Faneyth Blog: http://huntingbears.com.ve Github: http://github.com/LuisAlejandro Twitter: http://twitter.com/LuisAlejandro
CODE IS POETRY
*_______________________________________________* Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Sat, May 20, 2017, at 07:29 PM, Luis Alejandro Martínez Faneyth wrote:> It supports ['.whl', '.egg', '.zip', '.tgz', '.tar.gz', '.tar.bz2']
formats, and it extracts the data using any available.
Nice! If there are multiple of those formats present, does it get the data from just one? Or does it get data from all of them and combine it somehow?
I wasn't aware of the fact that some modules may be on one platform and not in another. I guess there's room for improvement.
It probably doesn't matter for most cases, but since setup.py runs arbitrary code, it's possible for it to install different modules in different situations - or even select modules at random, if you really want to confuse tools like this. ;-) This is why my own efforts at indexing focused on wheels - you can be sure of exactly what a wheel contains. My wheel-indexing tool 'wheeldex' is here, if there's any code or ideas there that you can use:https://github.com/takluyver/wheeldex
Thank you. I made this because I wanted to have an app that guessed python dependencies from code by scaning module imports and then looking up the Index. That app is called Pip Sala Bim and you can check it out here:> https://github.com/LuisAlejandro/pipsalabim
Neat, that's precisely one of the use cases I was thinking of for an index. The other thing I'm interested in is providing an interface to install modules by their import name rather than their PyPI name; I think your index should work for that as well. I'll dig into the code of both PyPIContents and Pip Sala Bim more soon. Thanks, Thomas

This looks very nice. The readme states that monkeypatching is used to extract this info from the `setup()` call. Is `setup.cfg` also taken into account?
What would it take to split of this part into a separate module/package so that we have one function taking a source directory and returning the contents of that call? I would be very interested in extracting not just available modules but also dependencies.
On Sat, May 20, 2017 at 4:01 AM, Luis Alejandro Martínez Faneyth < luis@huntingbears.com.ve> wrote:
Hi everyone,
I'm new to this list but I've been reading some threads in the archive.
Around february, an idea about indexing modules from PyPI packages was brought up. I've been working on something similar for quite a while.
PyPIContents is an index of PyPI packages that lists its modules and command line scripts in JSON format, like this:
[ ...
"1337": { "cmdline": [], "modules": [ "1337", "1337.1337" ], "version": "1.0.0" },
...
]
You can check it out here:
https://github.com/LuisAlejandro/pypicontents
And some use cases:
https://github.com/LuisAlejandro/pypicontents#use-cases
The actual index lives here, its around 60MB:
https://raw.githubusercontent.com/LuisAlejandro/ pypicontents/contents/pypi.json
Is updated daily with the help of Travis:
https://github.com/LuisAlejandro/pypicontents/blob/contents/.travis.yml
Anyway, I hope is useful and I'll be around for any comments or questions.
Cheers!
Luis Alejandro Martínez Faneyth Blog: http://huntingbears.com.ve Github: http://github.com/LuisAlejandro Twitter: http://twitter.com/LuisAlejandro
CODE IS POETRY
Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
participants (3)
-
Freddy Rietdijk
-
Luis Alejandro Martínez Faneyth
-
Thomas Kluyver