[pypy-dev] Performance, json and standard library

Maciej Fijalkowski fijall at gmail.com
Tue Sep 27 03:18:34 CEST 2011


On Mon, Sep 26, 2011 at 10:18 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> On Mon, Sep 26, 2011 at 7:13 PM, Bob Ippolito <bob at redivi.com> wrote:
>> You should also try the master branch of simplejson, the
>> _pypy_speedups branch is not necessarily better (which is why it is
>> not master).
>
> You should also look at https://bugs.pypy.org/issue866 for various patches.

Wrong link https://bugs.pypy.org/issue868

>
>>
>> On Mon, Sep 26, 2011 at 2:39 PM, Matthew Kaniaris <mkaniaris at gmail.com> wrote:
>>> I did some testing to see where we stand on JSON.  The pypy is from
>>> the trunk and the simplejson used with pypy is the _pypy_speedups
>>> branch.  The speedups make pypy about 2x faster on dumps than with the
>>> stdlib JSON module, slightly slower with loads, but up to ten times
>>> slower than cpython with simplejson with the 32kb file.  I'll try
>>> profiling the speedups branch to see if there is any easy fruit left,
>>> but I doubt we will get another 50% improvement out of it.
>>>
>>> -kans
>>>
>>> results:
>>> python using json:
>>>
>>> /home/test/3.4kb.json
>>> loads: 5 loops, best of 1000: 953 usec per loop
>>>
>>> dumps: 5 loops, best of 1000: 706 usec per loop
>>>
>>> /home/test/32kb.json
>>> loads: 5 loops, best of 1000: 10.9 msec per loop
>>>
>>> dumps: 5 loops, best of 1000: 9.13 msec per loop
>>>
>>> -------------------------
>>> python using simplejson:
>>>
>>> /home/test/3.4kb.json
>>> loads: 5 loops, best of 1000: 41.2 usec per loop
>>>
>>> dumps: 5 loops, best of 1000: 56 usec per loop
>>>
>>> /home/test/32kb.json
>>> loads: 5 loops, best of 1000: 604 usec per loop
>>>
>>> dumps: 5 loops, best of 1000: 391 usec per loop
>>>
>>> -------------------------
>>> pypy using json:
>>>
>>> /home/test/3.4kb.json
>>> loads: 5 loops, best of 1000: 146 usec per loop
>>>
>>> dumps: 5 loops, best of 1000: 429 usec per loop
>>>
>>> /home/test/32kb.json
>>> loads: 5 loops, best of 1000: 2.93 msec per loop
>>>
>>> dumps: 5 loops, best of 1000: 7.16 msec per loop
>>>
>>> -------------------------
>>> pypy using simplejson:
>>>
>>> /home/test/3.4kb.json
>>> loads: 5 loops, best of 1000: 197 usec per loop
>>>
>>> dumps: 5 loops, best of 1000: 148 usec per loop
>>>
>>> /home/test/32kb.json
>>> loads: 5 loops, best of 1000: 3.47 msec per loop
>>>
>>> dumps: 5 loops, best of 1000: 3.2 msec per loop
>>>
>>>
>>>
>>> On Sun, Sep 25, 2011 at 1:53 PM, Alex Gaynor <alex.gaynor at gmail.com> wrote:
>>>>
>>>>
>>>> On Sun, Sep 25, 2011 at 1:49 PM, Bob Ippolito <bob at redivi.com> wrote:
>>>>>
>>>>> simplejson would be a good target for changes that would not be easy
>>>>> to implement on top of the stdlib json. I'd be happy to accept any
>>>>> contributions. I failed to make big differences in performance when I
>>>>> tried at PyCon (at least that didn't regress performance for some
>>>>> people). The other things I'm missing are a good suite of documents to
>>>>> benchmark with, and a good tool to run the benchmarks so it's easy to
>>>>> see if incremental changes are better or worse.
>>>>>
>>>>> However, if RPython is required to make it faster, maybe implementing
>>>>> _json for the stdlib would actually be best.
>>>>>
>>>>> On Sun, Sep 25, 2011 at 10:30 AM, Zooko O'Whielacronx <zooko at zooko.com>
>>>>> wrote:
>>>>> > But don't people who need better json performance use simplejson
>>>>> > explicitly instead of using the standard library's json?
>>>>> >
>>>>> > Regards,
>>>>> >
>>>>> > Zooko
>>>>> > _______________________________________________
>>>>> > pypy-dev mailing list
>>>>> > pypy-dev at python.org
>>>>> > http://mail.python.org/mailman/listinfo/pypy-dev
>>>>> >
>>>>> _______________________________________________
>>>>> pypy-dev mailing list
>>>>> pypy-dev at python.org
>>>>> http://mail.python.org/mailman/listinfo/pypy-dev
>>>>
>>>> For what it's worth, I think we can get there, without needing to write any
>>>> RPython, through a combination of careful Python, and more JIT
>>>> optimizations.  For example, I'd like to get the code input[i:i+4] == "NULL"
>>>> to eventually generate:
>>>> read str length
>>>> check length >= 4
>>>> read 4 bytes out of input (single MOVL)
>>>> integer compare to ('N' << 0) | ('U' << 8) | ('L' << 16) | ('L' << 24)
>>>> in total about 7 x86 instructions.  I think this is definitely possible!
>>>> Alex
>>>>
>>>> --
>>>> "I disapprove of what you say, but I will defend to the death your right to
>>>> say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
>>>> "The people's good is the highest law." -- Cicero
>>>>
>>>>
>>>> _______________________________________________
>>>> pypy-dev mailing list
>>>> pypy-dev at python.org
>>>> http://mail.python.org/mailman/listinfo/pypy-dev
>>>>
>>>>
>>>
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev at python.org
>> http://mail.python.org/mailman/listinfo/pypy-dev
>>
>


More information about the pypy-dev mailing list