[Python-Dev] PEP 393 review

Sun Aug 28 21:47:05 CEST 2011

> I would say no more than a 15% slowdown on each of the following
> benchmarks:
> 
> - stringbench.py -u
>   (http://svn.python.org/view/sandbox/trunk/stringbench/)
> - iobench.py -t
>   (in Tools/iobench/)
> - the json_dump, json_load and regex_v8 tests from
>   http://hg.python.org/benchmarks/

I now have benchmark results for these; numbers are for revision
c10bcab2aac7, comparing to 1ea72da11724 (wide unicode), on 64-bit
Linux with gcc 4.6.1 running on Core i7 2.8GHz.

- stringbench gives 10% slowdown on total time; the tests take
  between 78% and 220%. The cost is typically not in performing
  the string operations themselves, but in the creation of the
  result strings. In PEP 393, a buffer must be scanned for the
  highest code point, which means that each byte must be inspected
  twice (a second time when the copying occurs).
- the iobench results are between 2% acceleration (seek operations),
  16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
  37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
  difference is probably in the UTF-8 decoder; I have already
  restored the "runs of ASCII" optimization and am out of ideas for
  further speedups. Again, having to scan the UTF-8 string twice
  is probably one cause of slowdown.
- the json and regex_v8 tests see a slowdown of below 1%.

The slowdown is larger when compared with a narrow Unicode build.

> Additionally, it would be nice if you could run at least some of the
> test_bigmem tests, according to your system's available RAM.

Running only StrTest with 4.5G allows me to run 2 tests
(test_encode_raw_unicode_escape and test_encode_utf7); this sees
a slowdown of 37% in Linux user time.

Regards,
Martin