How do I determine how much computation is pushed to the data? (Instead of pulling all the data and running the computation with one local node) ... https://en.wikipedia.org/wiki/Bulk_synchronous_parallel (MapReduce,)
- http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_query.... - https://github.com/yhat/pandasql/ - http://docs.ibis-project.org/sql.html#common-column-expressions - https://github.com/cloudera/ibis/blob/master/ibis/sql/alchemy.py
On Tuesday, November 1, 2016, Pavel Velikhov email@example.com wrote:
We have released PythonQL, a query language extension to Python (we have extended Python’s comprehensions with a full-fledged query language, drawing from the useful features of SQL, XQuery and JSONiq). Take a look at the project here: http://www.pythonql.org and lets us know what you think!
The way PythonQL currently works is you mark PythonQL files with a special encoding and the system runs a preprocessor for all such files. We have an interactive interpreter and Jupyter support planned.