I had some fun parsing and plotting the data (very simple, just the top packages for now). See here: https://github.com/lkraider/requirements-dataset/blob/master/index.ipynb Let me know if you would accept a pull request so others can use that as a starting point. att, -- Paul Eipper On Wed, Mar 8, 2017 at 1:36 PM, Nick Timkovich <prometheus235@gmail.com> wrote:
Looks like a fun chunk of data, what's the query you used? Can you add a README to the repo with some description if others want to iterate on it (maybe look into setup.py's?)
Nick
On Tue, Mar 7, 2017 at 5:06 AM, Jannis Gebauer <ja.geb@me.com> wrote:
Hi,
I ran a couple of queries against GitHubs public big query dataset [0] last week. I’m interested in requirement files in particular, so I ran a query extracting all available requirement files.
Since queries against this dataset are rather expensive ($7 on all repos), I thought I’d share the raw data here [1]. The data contains the repo name, the requirements file path and the contents of the file. Every line represents a JSON blob, read it with:
with open('data.json') as f: for line in f.readlines(): data = json.loads(line)
Maybe that’s of interest to some of you.
If you have any ideas on what to do with the data, please let me know.
—
Jannis Gebauer
[0]: https://cloud.google.com/bigquery/public-data/github [1]: https://github.com/jayfk/requirements-dataset
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig