Right now we don’t push anything yet, we fetch everything into the Python’s runtime. But going forward the current idea is to push as much computation to the database as possible (most of the time the database will do a better job then our engine).
If we run on top PySpark/Hadoop I think we should be able to completely translate 100% of PythonQL into these jobs.
On 1 Nov 2016, at 19:42, Wes Turner firstname.lastname@example.org wrote:
How do I determine how much computation is pushed to the data? (Instead of pulling all the data and running the computation with one local node) ... https://en.wikipedia.org/wiki/Bulk_synchronous_parallel https://en.wikipedia.org/wiki/Bulk_synchronous_parallel (MapReduce,)
- http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries
- http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_query.... http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_query.html
- https://github.com/yhat/pandasql/ https://github.com/yhat/pandasql/
- http://docs.ibis-project.org/sql.html#common-column-expressions http://docs.ibis-project.org/sql.html#common-column-expressions
- https://github.com/cloudera/ibis/blob/master/ibis/sql/alchemy.py https://github.com/cloudera/ibis/blob/master/ibis/sql/alchemy.py
On Tuesday, November 1, 2016, Pavel Velikhov <email@example.com mailto:firstname.lastname@example.org> wrote: Hi Folks,
We have released PythonQL, a query language extension to Python (we have extended Python’s comprehensions with a full-fledged query language, drawing from the useful features of SQL, XQuery and JSONiq). Take a look at the project here: http://www.pythonql.org http://www.pythonql.org/ and lets us know what you think!
The way PythonQL currently works is you mark PythonQL files with a special encoding and the system runs a preprocessor for all such files. We have an interactive interpreter and Jupyter support planned.