[Distutils] Data on requirement files on GitHub
Wes Turner
wes.turner at gmail.com
Thu Mar 9 22:57:14 EST 2017
https://en.wikipedia.org/wiki/BigQuery
BigQuery Dashboards
- http://bigqueri.es/c/github-archive
- https://redash.io/data-sources/google-bigquery
- https://github.com/getredash/redash
- https://github.com/getredash/redash/blob/master/requirements.txt
- https://github.com/getredash/redash/blob/master/Dockerfile
-
https://github.com/docker/docker/blob/master/builder/dockerfile/parser/parser.go
- https://github.com/DBuildService/dockerfile-parse/issues
- https://github.com/getredash/redash/blob/master/docker-compose.yml
Software Configuration Management / Dependency Management applications for
BigQuery:
- https://opensource.googleblog.com/2017/03/operation-rosehub.html
- "Googlers used BigQuery and GitHub to patch thousands of vulnerable
projects"
https://www.reddit.com/r/bigquery/comments/5x0x5z/googlers_used_bigquery_and_github_to_patch/
BigQuery Python Libraries
google-cloud-bigquery
- | Src: https://github.com/GoogleCloudPlatform/google-cloud-python
- | Pypi: https://pypi.python.org/pypi/google-cloud-bigquery
- | Docs:
https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-resources-python
google-api-python-client
- | Src: https://github.com/google/google-api-python-client
- | Pypi: https://pypi.python.org/pypi/google-api-python-client
- pandas.io.gbq uses google-api-python-client:
- Docs:
http://pandas.pydata.org/pandas-docs/stable/io.html#google-bigquery-experimental
- read_gbq()
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.read_gbq.html#pandas.io.gbq.read_gbq
- to_gbq()
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.to_gbq.html#pandas-io-gbq-to-gbq
Open Source Big Data Components for things like BigQuery:
Apache Drill
- | Wikipedia: https://en.wikipedia.org/wiki/Apache_Drill
- Apache Drill is similar to Google Dremel (which powers Google BigQuery)
- https://pypi.python.org/pypi/drillpy
Apache Beam
- | Wikipedia: https://en.wikipedia.org/wiki/Apache_Beam
- | Src: https://github.com/apache/beam
- | Docs: https://beam.apache.org/documentation/sdks/python/
- | Docs: https://beam.apache.org/get-started/quickstart-py/
- | Docs:
https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples
- Google Cloud Dataflow is now of Apache Beam
- https://cloud.google.com/dataflow/model/bigquery-io
Parsing (and MAINTAINING) Pip Requirements.txt Files:
- | Src: https://github.com/pypa/pip/tree/master/pip/req
- https://github.com/pypa/pip/issues/3884#issuecomment-236454008
- https://github.com/pypa/pip/issues/1479
- -> Pipfile, Pipfile.lock (``pipenv install pkgname --dev``)
- https://github.com/pyupio/safety-db#tools
- https://pyup.io/
- https://libraries.io/github/librariesio/pydeps
- https://github.com/librariesio/pydeps
- https://libraries.io/
- Pipfile, Pipfile.lock
- | PyPI: https://pypi.python.org/pypi/pipenv
- | PyPI: https://pypi.python.org/pypi/requirements-parser
- | PyPI: https://pypi.python.org/pypi/pipfile
- | Src: https://github.com/kennethreitz/pipenv
- These save to the Pipfile:
- ``pipenv install pkgname``
- ``pipenv install pkgname --dev``
- https://github.com/kennethreitz/pipenv/blob/master/pipenv/utils.py
- pip reqs.txt <--> Pipfile
... Thought I'd get these together; hopefully they're useful.
Cool Jupyter notebook!
( https://github.com/lkraider/requirements-dataset/blob/master/index.ipynb )
On Tue, Mar 7, 2017 at 5:06 AM, Jannis Gebauer <ja.geb at me.com> wrote:
> Hi,
>
> I ran a couple of queries against GitHubs public big query dataset [0]
> last week. I’m interested in requirement files in particular, so I ran a
> query extracting all available requirement files.
>
> Since queries against this dataset are rather expensive ($7 on all repos),
> I thought I’d share the raw data here [1]. The data contains the repo name,
> the requirements file path and the contents of the file. Every line
> represents a JSON blob, read it with:
>
> with open('data.json') as f:
> for line in f.readlines():
> data = json.loads(line)
>
> Maybe that’s of interest to some of you.
>
> If you have any ideas on what to do with the data, please let me know.
>
> —
>
> Jannis Gebauer
>
>
>
> [0]: https://cloud.google.com/bigquery/public-data/github
> [1]: https://github.com/jayfk/requirements-dataset
>
> _______________________________________________
> Distutils-SIG maillist - Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170309/003b82f4/attachment-0001.html>
More information about the Distutils-SIG
mailing list