<div dir="ltr">So, iter(file).next() is slow?<div><br></div><div>Alex</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Feb 18, 2013 at 10:51 AM, Amaury Forgeot d'Arc <span dir="ltr"><<a href="mailto:amauryfa@gmail.com" target="_blank">amauryfa@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div><div class="h5">2013/2/18 Eleytherios Stamatogiannakis <span dir="ltr"><<a href="mailto:estama@gmail.com" target="_blank">estama@gmail.com</a>></span><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div><div>On 18/02/13 18:44, Maciej Fijalkowski wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On Mon, Feb 18, 2013 at 6:20 PM, Eleytherios Stamatogiannakis<br>

<<a href="mailto:estama@gmail.com" target="_blank">estama@gmail.com</a>> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

We have found another (very simple) madIS query where PyPy is around 250x<br>

slower that CPython:<br>

<br>

CPython: 314msec<br>

PyPy: 1min 16sec<br>

<br>

The query if you would like to test it yourself is the following:<br>

<br>

select  count(*)  from   (file  'some_big_text_file.txt' limit 100000);<br>

<br>

To run it you'll need some big text file containing at least 100000 text<br>

lines (we have run above query with a very big XML file). You can also run<br>

above query with a lower limit (the behaviour will be the same) as such:<br>

<br>

select  count(*)  from   (file  'some_big_text_file.txt' limit 10000);<br>

<br>

Be careful for the file to not have a csv, tsv, json, db or gz ending<br>

because a different code path inside the "file" operator will be taken than<br>

the one for simple text files.<br>

<br>

l.<br>

<br>

<br>

______________________________<u></u>_________________<br>

pypy-dev mailing list<br>

<a href="mailto:pypy-dev@python.org" target="_blank">pypy-dev@python.org</a><br>

<a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/<u></u>mailman/listinfo/pypy-dev</a><br>

</blockquote>

<br>

Hey<br>

<br>

I would be incredibly convinient if you can change it to be a<br>

standalone benchmark (say reading large string from a file and<br>

decoding it in a whole or in pieces);<br>

<br>

</blockquote>

<br></div></div>

As it involves SQLite, CFFI and Python, it is very hard to extract the full execution path that madIS goes through even in a simple query like this.<br>

<br>

Nevertheless we extracted a part of the pure Python execution path, and PyPy is around 50% slower than CPython:<br>

<br>

CPython: 21 sec<br>

PyPy: 33 sec<br>

<br>

The full madIS execution path involves additional CFFI calls and callbacks (from SQLite) to pass the data to SQLite.<br>

<br>

To run the test.py:<br>

<br>

test.py big_text_file<br></blockquote><div><br></div></div></div><div>Most of the time is spent in file iteration.</div><div>I added     </div><div>    f = f.read().splitlines()</div><div>and the query is almost instant.</div>

<div><br>

</div><div> </div></div><div class="HOEnZb"><div class="h5">-- <br>Amaury Forgeot d'Arc

</div></div><br>_______________________________________________<br>

pypy-dev mailing list<br>

<a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>

<a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>

<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)<br>"The people's good is the highest law." -- Cicero<br>


</div>