[Distutils] Data on requirement files on GitHub

Wes Turner wes.turner at gmail.com
Thu Mar 9 22:57:14 EST 2017


https://en.wikipedia.org/wiki/BigQuery

BigQuery Dashboards

- http://bigqueri.es/c/github-archive
- https://redash.io/data-sources/google-bigquery
  - https://github.com/getredash/redash
  - https://github.com/getredash/redash/blob/master/requirements.txt
  - https://github.com/getredash/redash/blob/master/Dockerfile
    -
https://github.com/docker/docker/blob/master/builder/dockerfile/parser/parser.go
      - https://github.com/DBuildService/dockerfile-parse/issues
  - https://github.com/getredash/redash/blob/master/docker-compose.yml

Software Configuration Management / Dependency Management applications for
BigQuery:


-  https://opensource.googleblog.com/2017/03/operation-rosehub.html
  - "Googlers used BigQuery and GitHub to patch thousands of vulnerable
projects"

https://www.reddit.com/r/bigquery/comments/5x0x5z/googlers_used_bigquery_and_github_to_patch/


BigQuery Python Libraries

google-cloud-bigquery

- | Src: https://github.com/GoogleCloudPlatform/google-cloud-python
- | Pypi: https://pypi.python.org/pypi/google-cloud-bigquery
- | Docs:
https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-resources-python

google-api-python-client

- | Src: https://github.com/google/google-api-python-client
- | Pypi: https://pypi.python.org/pypi/google-api-python-client
- pandas.io.gbq uses google-api-python-client:
  - Docs:
http://pandas.pydata.org/pandas-docs/stable/io.html#google-bigquery-experimental
  - read_gbq()

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.read_gbq.html#pandas.io.gbq.read_gbq
  - to_gbq()

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.to_gbq.html#pandas-io-gbq-to-gbq


Open Source Big Data Components for things like BigQuery:

Apache Drill

- | Wikipedia: https://en.wikipedia.org/wiki/Apache_Drill
- Apache Drill is similar to Google Dremel (which powers Google BigQuery)
- https://pypi.python.org/pypi/drillpy

Apache Beam

- | Wikipedia: https://en.wikipedia.org/wiki/Apache_Beam
- | Src: https://github.com/apache/beam
- | Docs: https://beam.apache.org/documentation/sdks/python/
- | Docs: https://beam.apache.org/get-started/quickstart-py/
- | Docs:
https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples
- Google Cloud Dataflow is now of Apache Beam
- https://cloud.google.com/dataflow/model/bigquery-io


Parsing (and MAINTAINING) Pip Requirements.txt Files:

- | Src: https://github.com/pypa/pip/tree/master/pip/req
  - https://github.com/pypa/pip/issues/3884#issuecomment-236454008
    - https://github.com/pypa/pip/issues/1479
      - -> Pipfile, Pipfile.lock (``pipenv install pkgname --dev``)

- https://github.com/pyupio/safety-db#tools
  - https://pyup.io/
- https://libraries.io/github/librariesio/pydeps
  - https://github.com/librariesio/pydeps
  - https://libraries.io/

- Pipfile, Pipfile.lock
  - | PyPI: https://pypi.python.org/pypi/pipenv
    - | PyPI: https://pypi.python.org/pypi/requirements-parser
    - | PyPI: https://pypi.python.org/pypi/pipfile
  - | Src: https://github.com/kennethreitz/pipenv
  - These save to the Pipfile:
    - ``pipenv install pkgname``
    - ``pipenv install pkgname --dev``
  - https://github.com/kennethreitz/pipenv/blob/master/pipenv/utils.py
    - pip reqs.txt <--> Pipfile

... Thought I'd get these together; hopefully they're useful.

Cool Jupyter notebook!
( https://github.com/lkraider/requirements-dataset/blob/master/index.ipynb )

On Tue, Mar 7, 2017 at 5:06 AM, Jannis Gebauer <ja.geb at me.com> wrote:

> Hi,
>
> I ran a couple of queries against GitHubs public big query dataset [0]
> last week. I’m interested in requirement files in particular, so I ran a
> query extracting all available requirement files.
>
> Since queries against this dataset are rather expensive ($7 on all repos),
> I thought I’d share the raw data here [1]. The data contains the repo name,
> the requirements file path and the contents of the file. Every line
> represents a JSON blob, read it with:
>
> with open('data.json') as f:
>     for line in f.readlines():
>         data = json.loads(line)
>
> Maybe that’s of interest to some of you.
>
> If you have any ideas on what to do with the data, please let me know.
>
>>
> Jannis Gebauer
>
>
>
> [0]: https://cloud.google.com/bigquery/public-data/github
> [1]: https://github.com/jayfk/requirements-dataset
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170309/003b82f4/attachment-0001.html>


More information about the Distutils-SIG mailing list