So, iter(file).next() is slow? Alex On Mon, Feb 18, 2013 at 10:51 AM, Amaury Forgeot d'Arc <amauryfa@gmail.com>wrote:
2013/2/18 Eleytherios Stamatogiannakis <estama@gmail.com>
On 18/02/13 18:44, Maciej Fijalkowski wrote:
On Mon, Feb 18, 2013 at 6:20 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
We have found another (very simple) madIS query where PyPy is around 250x slower that CPython:
CPython: 314msec PyPy: 1min 16sec
The query if you would like to test it yourself is the following:
select count(*) from (file 'some_big_text_file.txt' limit 100000);
To run it you'll need some big text file containing at least 100000 text lines (we have run above query with a very big XML file). You can also run above query with a lower limit (the behaviour will be the same) as such:
select count(*) from (file 'some_big_text_file.txt' limit 10000);
Be careful for the file to not have a csv, tsv, json, db or gz ending because a different code path inside the "file" operator will be taken than the one for simple text files.
l.
______________________________**_________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/**mailman/listinfo/pypy-dev<http://mail.python.org/mailman/listinfo/pypy-dev>
Hey
I would be incredibly convinient if you can change it to be a standalone benchmark (say reading large string from a file and decoding it in a whole or in pieces);
As it involves SQLite, CFFI and Python, it is very hard to extract the full execution path that madIS goes through even in a simple query like this.
Nevertheless we extracted a part of the pure Python execution path, and PyPy is around 50% slower than CPython:
CPython: 21 sec PyPy: 33 sec
The full madIS execution path involves additional CFFI calls and callbacks (from SQLite) to pass the data to SQLite.
To run the test.py:
test.py big_text_file
Most of the time is spent in file iteration. I added f = f.read().splitlines() and the query is almost instant.
-- Amaury Forgeot d'Arc
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero