[Python-3000] Lazy strings (was Re: Py3k release schedule worries)

Sat Jan 13 00:59:08 CET 2007

Guido van Rossum wrote:
> Finally (unrelated to the memory problem) I'd like to see some 
> benchmarks to prove that this is really worth it.
Here's a first cut at some benchmarks.  I gently hacked the pybench in 
Tools so it'd run, and compared the full "lazy strings" patch to an 
unpatched tree.  In the following output, "this" is unpatched and 
"other" has the lazy patch.  The envelope please:

-------------------------------------------------------------------------------
PYBENCH 2.0
-------------------------------------------------------------------------------
* using Python 3.0x
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.clock

Running 10 round(s) of the suite at warp factor 10:

Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other   
diff
-------------------------------------------------------------------------------
                 ConcatUnicode:   185ms    46ms +298.6%   206ms    48ms 
+332.7%
       CreateUnicodeWithConcat:   129ms    67ms  +93.1%   132ms    71ms  
+86.5%
                UnicodeSlicing:   156ms    75ms +108.0%   161ms    77ms 
+108.9%
-------------------------------------------------------------------------------
Totals:                          8350ms  8148ms   +2.5%  8586ms  
8416ms   +2.0%

I'll post a zip file containing the full results and the pybench data 
files to the patch page.

As I suspected, this is a bigger win than it was with 8-bit strings, as 
8-bit strings have gotten a lot more TLC over the years.  Once we 
propagate the accumulated tweaks from stringobject.c to unicodeobject.c 
the improvement will be a little less dramatic.  Then again, some of 
those improvements help lazy evaluation too, most notably the 
concatenation speed hack in Python/ceval.c string_concatenation()  (see 
line 4179, comment "In the common case").

Keep in mind that "lazy slices" means more than just [:] notation; I 
converted things like str.split() and str.strip() and str.partition() to 
generate lazy slices too, and nearly all of unicodeobject.c will process 
lazy slices directly without rendering.  So the speed improvements (and 
corresponding change in memory usage) affects more than you might 
suspect at first glance.

Cheers,

/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070112/3c117f6e/attachment.html