On 13 April 2017 at 02:15, Victor Stinner <victor.stinner(a)gmail.com> wrote:
> 2017-04-12 10:52 GMT+02:00 Victor Stinner <victor.stinner(a)gmail.com>:
>> I'm running benchmarks with this option. Once results will be ready, I
>> will remove the old 2.7 result to replace it with the new one.
>
> Done. speed.python.org now uses UCS-4 on Python 2.7. Is it better now?
Thanks!
> Previous JSON file:
> https://github.com/haypo/performance_results/raw/master/2017-03-31-cpython/…
>
> New JSON file:
> https://github.com/haypo/performance_results/raw/master/2017-04-12-cpython/…
>
> I see small performance differences, but they don't seem to be related
> to UTF-16 => UCS-4, but more random noise.
Given that lack of divergence and the known Unicode correctness
problems in narrow builds, I guess it doesn't make much sense to
invest time in benchmarking both of them.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
2017-04-12 10:52 GMT+02:00 Victor Stinner <victor.stinner(a)gmail.com>:
> I'm running benchmarks with this option. Once results will be ready, I
> will remove the old 2.7 result to replace it with the new one.
Done. speed.python.org now uses UCS-4 on Python 2.7. Is it better now?
Previous JSON file:
https://github.com/haypo/performance_results/raw/master/2017-03-31-cpython/…
New JSON file:
https://github.com/haypo/performance_results/raw/master/2017-04-12-cpython/…
I see small performance differences, but they don't seem to be related
to UTF-16 => UCS-4, but more random noise. Comparison between the two
files:
$ python3 -m perf compare_to
2017-03-31-cpython/2017-04-03_16-11-2.7-23d6eb656ec2.json.gz
2017-04-12-cpython/2017-04-10_17-27-2.7-e0cba5b45a5c.json.gz --table
-G --min-speed=5
+-----------------+-----------------------------------+-----------------------------------+
| Benchmark | 2017-04-03_16-11-2.7-23d6eb656ec2 |
2017-04-10_17-27-2.7-e0cba5b45a5c |
+=================+===================================+===================================+
| chameleon | 23.3 ms | 25.0 ms: 1.07x
slower (+7%) |
+-----------------+-----------------------------------+-----------------------------------+
| scimark_fft | 623 ms | 673 ms: 1.08x
slower (+8%) |
+-----------------+-----------------------------------+-----------------------------------+
| fannkuch | 806 ms | 889 ms: 1.10x
slower (+10%) |
+-----------------+-----------------------------------+-----------------------------------+
| scimark_sor | 401 ms | 443 ms: 1.11x
slower (+11%) |
+-----------------+-----------------------------------+-----------------------------------+
| unpack_sequence | 138 ns | 154 ns: 1.12x
slower (+12%) |
+-----------------+-----------------------------------+-----------------------------------+
| regex_v8 | 53.4 ms | 60.0 ms: 1.12x
slower (+12%) |
+-----------------+-----------------------------------+-----------------------------------+
Not significant (61): (...)
Hopefully, perf stores information on the Unicode implementation in
metadata. You can check metadata using:
$ python3 -m perf 2017-03-31_06-53-2.7-5aa913d72317.json.gz -b
2to3|grep python_unicode
- python_unicode: UTF-16
$ python3 -m perf metadata 2017-04-10_17-27-2.7-e0cba5b45a5c.json.gz
-b 2to3|grep python_unicode
- python_unicode: UCS-4
Victor
2017-04-12 6:58 GMT+02:00 Nick Coghlan <ncoghlan(a)gmail.com>:
> speed.python.org has been updated to split out per-branch results for
> easier cross version comparisons, but looking at the performance repo
> suggests that the only 2.7 results currently reported are for the
> default UCS2 builds.
You're right, it's a bug, it's wasn't deliberate.
I fixed "performance compile": it now pass --enable-unicode=ucs4 to
configure if the branch starts with "2.".
https://github.com/python/performance/commit/9af0c6e029db9d2a8475f11d5cc601…
I'm running benchmarks with this option. Once results will be ready, I
will remove the old 2.7 result to replace it with the new one.
Note: Python 2.7 configure should use UCS4 by default on Linux, but
that's a different topic ;-)
Victor
speed.python.org has been updated to split out per-branch results for
easier cross version comparisons, but looking at the performance repo
suggests that the only 2.7 results currently reported are for the
default UCS2 builds.
That isn't the way Linux distros typically ship Python: we/they
specify the "--enable-unicode=ucs4" option when calling configure in
order to get correct Unicode handling.
Not that long ago, `pyenv` also switched to using wide builds for
`manylinux1` wheel compatibility, and conda has similarly used wide
builds from the start for ABI compatibility with system Python
runtimes.
That means the current Python 2 benchmark results may be
unrepresentative for anyone using a typical Linux build of CPython:
the pay-off in reduced memory use and reduced data copying from Python
3's dynamic string representation is higher relative to Python 2 wide
builds than it is relative to narrow builds, and we'd expect that to
affect at least the benchmarks that manipulate text data.
Perhaps it would make sense to benchmark two different variants of the
Python 2.7 branch, one with a wide build, and one with a narrow one?
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
On 06.04.17 12:00, Victor Stinner wrote:
> I succeeded to compute benchmarks on CPython master on the period
> April, 2014-April,2017: we now have have a timeline over 3 years of
> CPython performance!
>
> https://speed.python.org/timeline/
Excellent! I always wanted to see such graphics.
But can you please output years on the scale? And would be nice to add
thin horizontal lines for current performances of maintained Python
releases.
Hi,
I'm still working on analyzing past optimizations to guide future
optimizations. I succeeded to identify multiple significant
optimizations over the last 3 years. At least for me, some were
unexpected like "Use the test suite for profile data" which made
pidigts 1.16x faster.
Here is a report of my work of last weeks.
I succeeded to compute benchmarks on CPython master on the period
April, 2014-April,2017: we now have have a timeline over 3 years of
CPython performance!
https://speed.python.org/timeline/
I started to take notes on significant performance changes (speedup
and slowdown) of this timeline:
http://pyperformance.readthedocs.io/cpython_results_2017.html
To identify the change which introduced a significant performance
change, I wrote a Python script running a Git bisection: compile
CPython, run benchmark, repeat.
https://github.com/haypo/misc/blob/master/misc/bisect_cpython_perf.py
It uses a configuration file which looks like:
---
[config]
work_dir = ~/prog/bench_python/bisect-pickle
src_dir = ~/prog/bench_python/master
old_commit = 133138a284be1985ebd9ec9014f1306b9a42
new_commit = 10427f44852b6e872034061421a8890902b8f
benchmark = ~/prog/bench_python/performance/performance/benchmarks/bm_pickle.py
pickle
benchmark_opts = --inherit-environ=PYTHONPATH -p5 -v
configure_args =
---
I succeeded to identify many significant optimizations (TODO: validate
them on the speed-python server), examples:
* PyMem_Malloc() now uses the fast pymalloc allocator
* Add a C implementation of collections.OrderedDict
* Use the test suite for profile data
* Speedup method calls 1.2x
* Added C implementation of functools.lru_cache()
* Optimized ElementTree.iterparse(); it is now 2x faster
perf, performance, server configuration, etc. evolve quicker than
expected, so I created a Git project to keep a copy of JSON files:
https://github.com/haypo/performance_results
I already lost data of my first miletone (november-december 2016), but
you have data from the second (december 2016-february 2017) and third
(march 2016-today) milestones.
I'm now discussing with PyPy to see how performance could be used to
measure PyPy performance.
Victor
Unfortunately, the C version of pickle lacks the extensibility of the
pure Python, so the pure Python has to be used in some cases. One such
example is the `cloudpickle` project, which extends pickle to
support many more types, such as local functions. `cloudpickle` is
often used by distributed executors to allow shipping Python code for
remote execution on a cluster.
See
https://github.com/cloudpipe/cloudpickle/blob/master/cloudpickle/cloudpickl…
Regards
Antoine.
On Wed, 5 Apr 2017 01:31:20 +1000
Nick Coghlan <ncoghlan(a)gmail.com> wrote:
> On 4 April 2017 at 21:43, Victor Stinner <victor.stinner(a)gmail.com> wrote:
> > 2017-04-04 12:06 GMT+02:00 Serhiy Storchaka <storchaka(a)gmail.com>:
> >> I consider it as a benchmark of Python interpreter itself.
> >
> > Don't we have enough benchmarks to test the Python interpreter?
> >
> > I would prefer to have more realistic use cases than "reimplement
> > pickle in pure Python".
> >
> > "unpickle_pure_python" name can be misleading as well to users
> > exploring speed.python.org data, no?
>
> The split benchmark likely made more sense in Python 2, when "import
> pickle" gave you the pure Python version by default, and you had to do
> "import cPickle as pickle" to get the accelerated version - you'd get
> very different performance characteristics based on which import the
> application used.
>
> It makes significantly less sense now that Python 3 always using the
> accelerated version by default and only falls back to pure Python if
> the accelerator module is missing for some reason. If anything, the
> appropriate cross-version comparison would be between the pure Python
> version in 2.7, and the accelerated version in 3.x, since that
> reflects the performance change you get when you do "import pickle".
>
> However, that argument only applies to whether or not to include it in
> the default benchmark set used to compare the overall performance
> across versions and implementations - it's still valid as a
> microbenchmark looking for major regressions in the speed of the pure
> Python fallback.
>
> Cheers,
> Nick.
>
(Crap, how did I sent an incomplete email? Sorry about that.)
Hi,
I hacked my "performance compile" command to force pip 7.1.2 on alpha
versions of Python 3.5, which worked around the pyparsing regression
(used since pip 8):
https://sourceforge.net/p/pyparsing/bugs/100/
I succeeded to run benchmarks on CPython on the period April, 2014 -
April, 2017, with one dot per quarter (so 4 dots per year).
I started to analyze performance in depth and added notes in the
performance documentation:
http://pyperformance.readthedocs.io/cpython_results_2017.html
(performance has now an online doc!)
I wrote a tool to bisect a performance change, it tries to find the
commit which made the significant performance change (slowdown or
speedup):
https://github.com/haypo/misc/blob/master/misc/find_git_revisions_by_date.py
I now started to compute one dot per month to have a better
resolution, since my tool failed to bisect the April,2016-July,2016
period. With a resolution of 1 month, I identified the "PyMem_Malloc()
now uses the fast pymalloc allocator" change which has a signficant
impact on unpickle_list.
Victor
Hi,
I hacked my "performance compile" command to force pip 7.1.2 on alpha
versions of Python 3.5, which worked around the pyparsing regression:
https://sourceforge.net/p/pyparsing/bugs/100/
I succeeded to run benchmarks on CPython on the period April, 2014 -
April, 2017, with one dot per quarter.
I started to analyze performance in depth and added notes at: