Hi,
I ran a couple of queries against GitHubs public big query dataset [0] last week. I’m interested in requirement files in particular, so I ran a query extracting all available requirement files.
Since queries against this dataset are rather expensive ($7 on all repos), I thought I’d share the raw data here [1]. The data contains the repo name, the requirements file path and the contents of the file. Every line represents a JSON blob, read it with:
with open('data.json') as f:
for line in f.readlines():
data = json.loads(line)
Maybe that’s of interest to some of you.
If you have any ideas on what to do with the data, please let me know.
—
Jannis Gebauer