[Patches] [ python-Patches-918462 ] simple
SourceForge.net
noreply at sourceforge.net
Tue Mar 23 02:26:51 EST 2004
Patches item #918462, was opened at 2004-03-17 20:50
Message generated for change (Comment added) made by rhettinger
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=918462&group_id=5470
Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Raymond Hettinger (rhettinger)
Summary: simple
Initial Comment:
All this "is" vs "==" discussion led me to look at ceval.c.
The attached patch seems to speed up "is" and "is not"
comparisons - saving a function call to do a simple pointer
comparison for non-integer arguments.
The test suite passes, but it's been quite awhile since I
messed around with the interpreter code, so I thought I
ought to have another pair of eyeballs check it out...
----------------------------------------------------------------------
>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-23 02:26
Message:
Logged In: YES
user_id=80475
I'm pretty sure that this is a false optimization because
the time saved in the function call is being offset by the
extra unpredictable branch for the other tests.
Even if those others are losing 1% while either "is" or
"isnot" gain 10%, the comparisons are not apt. The total
time for rich compares is so long that 1% represents much
more real time than 1% of an is/insnot test. Also, the
results need to be considered in aggregate with real times
(not percentages) and appropriate frequency weighting (if
known). For example:
IS occurs 100 times saving 9 microsec each time
ISNOT occurs 70 times saving 9 microsec each time
EQ occurs 700 times costing 4 microsec each time
NE occurs 50 times costing 4 microsec each time
LT occurs 100 times costing 4 microsec each time
--> weighted result 1.8 microsec lost
Of course, this can't be done exactly or even inexactly, but
it shows that the percentages can't be considered out of the
context of dynamic usage frequency, aggregations of all the
operators, and real time.
If something like this patch needs to go in, consider making
the branches predictable:
slow_compare:
if (oparg == PyCmp_IS) {
x = (v == w) ? Py_True : Py_False;
Py_INCREF(x);
} else if (oparg == PyCmp_IS_NOT) {
x = (v != w) ? Py_True : Py_False;
Py_INCREF(x);
} else
x = cmp_outcome(oparg, v, w);
Also, when it comes to micro-optimizations that are compiler
sensitive, the Intel timing tests should be built with the
compiler actually used to build the distribution (no sense
convincing ourselves of an optimization that doesn't occur
on the real distribution).
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-03-23 00:15
Message:
Logged In: YES
user_id=31435
When you introduce a new branch, and time it in isolation,
HW may have enough resource to optimize for both branch
targets simultaneously. Run a ton of other stuff too, though,
and then it can start to lose. Still, for detailed answers about
anything at this level, you need to use a HW simulator --
modern processors are intractably complex, and the user-
visible programming model supplied by Pentium in particular is
multiple layers removed from bottom-line reality now, so much
so that Intel doesn't even try to supply "instruction timings"
anymore (they depend in complex ways on the internal states
of resources that aren't visible in the programming model).
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-03-22 16:53
Message:
Logged In: YES
user_id=44345
I reran the test on a Linux system today and got similar results.
I'm pasting them here mostly as documentation. I'm still a bit
confused why the == and > tests should show improvement, but
they often do on both platforms. Any ideas? Looking at the
assembly code generated GCC inserts basically the same four
instructions on both the Intel and PowerPC platforms:
cmpl $8, -40(%ebp)
je .L580
cmpl $9, -40(%ebp)
je .L583
on Intel or
cmpwi cr7,r24,8
beq- cr7,L622
cmpwi cr7,r24,9
beq- cr7,L625
on PowerPC.
I also tried pystone. I see performance hits on both Linux and
Mac OSX:
Fastest of ten runs
patched unpatched
Linux 37878.8 38167.9
Mac OSX 13888.9 14124.3
Oh well... It was a thought.
Test output on Linux:
s = 'abc'
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.116 0.103 0.013 -11.2
s == 'abc' 0.145 0.141 0.004 -2.8
s > 'abc' 0.140 0.142 -0.002 1.4
s is 4 0.139 0.121 0.018 -12.9
s == 4 0.271 0.293 -0.022 8.1
s > 4 0.276 0.273 0.003 -1.1
s is -1001 0.126 0.120 0.006 -4.8
s == -1001 0.270 0.272 -0.002 0.7
s > -1001 0.282 0.275 0.007 -2.5
s is 34.7 0.133 0.119 0.014 -10.5
s == 34.7 0.352 0.343 0.009 -2.6
s > 34.7 0.340 0.344 -0.004 1.2
s is 'a b c' 0.135 0.118 0.017 -12.6
s == 'a b c' 0.159 0.157 0.002 -1.3
s > 'a b c' 0.200 0.201 -0.001 0.5
s is True 0.177 0.170 0.007 -4.0
s == True 0.316 0.318 -0.002 0.6
s > True 0.321 0.321 0.000 0.0
s = 4
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.143 0.120 0.023 -16.1
s == 'abc' 0.266 0.285 -0.019 7.1
s > 'abc' 0.270 0.276 -0.006 2.2
s is 4 0.175 0.103 0.072 -41.1
s == 4 0.105 0.105 0.000 0.0
s > 4 0.106 0.107 -0.001 0.9
s is -1001 0.119 0.119 0.000 0.0
s == -1001 0.119 0.119 0.000 0.0
s > -1001 0.121 0.178 -0.057 47.1
s is 34.7 0.127 0.129 -0.002 1.6
s == 34.7 0.201 0.195 0.006 -3.0
s > 34.7 0.193 0.197 -0.004 2.1
s is 'a b c' 0.212 0.125 0.087 -41.0
s == 'a b c' 0.268 0.271 -0.003 1.1
s > 'a b c' 0.269 0.276 -0.007 2.6
s is True 0.196 0.160 0.036 -18.4
s == True 0.239 0.258 -0.019 7.9
s > True 0.265 0.237 0.028 -10.6
s = None
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.120 0.109 0.011 -9.2
s == 'abc' 0.203 0.204 -0.001 0.5
s > 'abc' 0.206 0.206 0.000 0.0
s is 4 0.119 0.110 0.009 -7.6
s == 4 0.217 0.214 0.003 -1.4
s > 4 0.214 0.220 -0.006 2.8
s is -1001 0.120 0.107 0.013 -10.8
s == -1001 0.207 0.207 0.000 0.0
s > -1001 0.207 0.214 -0.007 3.4
s is 34.7 0.122 0.112 0.010 -8.2
s == 34.7 0.274 0.270 0.004 -1.5
s > 34.7 0.272 0.271 0.001 -0.4
s is 'a b c' 0.148 0.128 0.020 -13.5
s == 'a b c' 0.240 0.242 -0.002 0.8
s > 'a b c' 0.206 0.210 -0.004 1.9
s is True 0.162 0.153 0.009 -5.6
s == True 0.267 0.262 0.005 -1.9
s > True 0.284 0.258 0.026 -9.2
s = -1000
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.218 0.128 0.090 -41.3
s == 'abc' 0.274 0.275 -0.001 0.4
s > 'abc' 0.264 0.301 -0.037 14.0
s is 4 0.125 0.120 0.005 -4.0
s == 4 0.123 0.122 0.001 -0.8
s > 4 0.119 0.121 -0.002 1.7
s is -1001 0.123 0.123 0.000 0.0
s == -1001 0.132 0.123 0.009 -6.8
s > -1001 0.121 0.121 0.000 0.0
s is 34.7 0.130 0.215 -0.085 65.4
s == 34.7 0.199 0.197 0.002 -1.0
s > 34.7 0.194 0.236 -0.042 21.6
s is 'a b c' 0.158 0.140 0.018 -11.4
s == 'a b c' 0.294 0.293 0.001 -0.3
s > 'a b c' 0.302 0.300 0.002 -0.7
s is True 0.190 0.161 0.029 -15.3
s == True 0.234 0.232 0.002 -0.9
s > True 0.238 0.234 0.004 -1.7
s = 34.2
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.133 0.120 0.013 -9.8
s == 'abc' 0.338 0.330 0.008 -2.4
s > 'abc' 0.350 0.338 0.012 -3.4
s is 4 0.126 0.121 0.005 -4.0
s == 4 0.194 0.197 -0.003 1.5
s > 4 0.193 0.196 -0.003 1.6
s is -1001 0.132 0.120 0.012 -9.1
s == -1001 0.293 0.193 0.100 -34.1
s > -1001 0.196 0.190 0.006 -3.1
s is 34.7 0.117 0.105 0.012 -10.3
s == 34.7 0.153 0.153 0.000 0.0
s > 34.7 0.156 0.155 0.001 -0.6
s is 'a b c' 0.152 0.138 0.014 -9.2
s == 'a b c' 0.360 0.398 -0.038 10.6
s > 'a b c' 0.334 0.354 -0.020 6.0
s is True 0.171 0.174 -0.003 1.8
s == True 0.248 0.254 -0.006 2.4
s > True 0.247 0.244 0.003 -1.2
s = 'a b c'
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.137 0.117 0.020 -14.6
s == 'abc' 0.157 0.158 -0.001 0.6
s > 'abc' 0.204 0.201 0.003 -1.5
s is 4 0.131 0.119 0.012 -9.2
s == 4 0.269 0.272 -0.003 1.1
s > 4 0.277 0.277 0.000 0.0
s is -1001 0.153 0.146 0.007 -4.6
s == -1001 0.299 0.294 0.005 -1.7
s > -1001 0.299 0.302 -0.003 1.0
s is 34.7 0.153 0.146 0.007 -4.6
s == 34.7 0.374 0.368 0.006 -1.6
s > 34.7 0.342 0.336 0.006 -1.8
s is 'a b c' 0.140 0.118 0.022 -15.7
s == 'a b c' 0.150 0.158 -0.008 5.3
s > 'a b c' 0.160 0.156 0.004 -2.5
s is True 0.193 0.194 -0.001 0.5
s == True 0.345 0.338 0.007 -2.0
s > True 0.318 0.319 -0.001 0.3
s = object()
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.158 0.143 0.015 -9.5
s == 'abc' 0.298 0.294 0.004 -1.3
s > 'abc' 0.288 0.292 -0.004 1.4
s is 4 0.129 0.121 0.008 -6.2
s == 4 0.249 0.250 -0.001 0.4
s > 4 0.248 0.249 -0.001 0.4
s is -1001 0.151 0.152 -0.001 0.7
s == -1001 0.271 0.266 0.005 -1.8
s > -1001 0.284 0.271 0.013 -4.6
s is 34.7 0.152 0.140 0.012 -7.9
s == 34.7 0.364 0.385 -0.021 5.8
s > 34.7 0.429 0.392 0.037 -8.6
s is 'a b c' 0.152 0.138 0.014 -9.2
s == 'a b c' 0.300 0.297 0.003 -1.0
s > 'a b c' 0.288 0.285 0.003 -1.0
s is True 0.192 0.184 0.008 -4.2
s == True 0.325 0.329 -0.004 1.2
s > True 0.324 0.322 0.002 -0.6
s = []
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.126 0.121 0.005 -4.0
s == 'abc' 0.266 0.285 -0.019 7.1
s > 'abc' 0.273 0.271 0.002 -0.7
s is 4 0.125 0.119 0.006 -4.8
s == 4 0.269 0.269 0.000 0.0
s > 4 0.268 0.274 -0.006 2.2
s is -1001 0.133 0.121 0.012 -9.0
s == -1001 0.269 0.291 -0.022 8.2
s > -1001 0.271 0.269 0.002 -0.7
s is 34.7 0.132 0.124 0.008 -6.1
s == 34.7 0.332 0.362 -0.030 9.0
s > 34.7 0.339 0.336 0.003 -0.9
s is 'a b c' 0.125 0.119 0.006 -4.8
s == 'a b c' 0.268 0.291 -0.023 8.6
s > 'a b c' 0.275 0.273 0.002 -0.7
s is True 0.171 0.164 0.007 -4.1
s == True 0.317 0.315 0.002 -0.6
s > True 0.338 0.316 0.022 -6.5
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-03-21 09:33
Message:
Logged In: YES
user_id=44345
I spent a fair amount of time yesterday refining and running a
shell script (attached) to compare the before and after times for
various comparisons of simple objects. Here's the output:
s = 'abc'
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.375 0.329 0.046 -12.3
s == 'abc' 0.491 0.493 -0.002 0.4
s > 'abc' 0.491 0.493 -0.002 0.4
s is 4 0.375 0.333 0.042 -11.2
s == 4 1.200 1.190 0.010 -0.8
s > 4 1.200 1.190 0.010 -0.8
s is -1001 0.378 0.332 0.046 -12.2
s == -1001 1.200 1.190 0.010 -0.8
s > -1001 1.200 1.180 0.020 -1.7
s is 34.7 0.370 0.325 0.045 -12.2
s == 34.7 1.620 1.590 0.030 -1.9
s > 34.7 1.600 1.590 0.010 -0.6
s is 'a b c' 0.369 0.328 0.041 -11.1
s == 'a b c' 0.475 0.476 -0.001 0.2
s > 'a b c' 0.559 0.563 -0.004 0.7
s is True 0.531 0.491 0.040 -7.5
s == True 1.400 1.390 0.010 -0.7
s > True 1.400 1.380 0.020 -1.4
s = 4
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.369 0.325 0.044 -11.9
s == 'abc' 1.200 1.190 0.010 -0.8
s > 'abc' 1.200 1.190 0.010 -0.8
s is 4 0.353 0.353 0.000 0.0
s == 4 0.352 0.355 -0.003 0.9
s > 4 0.354 0.350 0.004 -1.1
s is -1001 0.347 0.350 -0.003 0.9
s == -1001 0.350 0.353 -0.003 0.9
s > -1001 0.346 0.345 0.001 -0.3
s is 34.7 0.367 0.327 0.040 -10.9
s == 34.7 0.773 0.769 0.004 -0.5
s > 34.7 0.771 0.772 -0.001 0.1
s is 'a b c' 0.370 0.327 0.043 -11.6
s == 'a b c' 1.200 1.190 0.010 -0.8
s > 'a b c' 1.200 1.190 0.010 -0.8
s is True 0.534 0.492 0.042 -7.9
s == True 0.905 0.911 -0.006 0.7
s > True 0.904 0.913 -0.009 1.0
s = None
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.368 0.327 0.041 -11.1
s == 'abc' 0.962 0.950 0.012 -1.2
s > 'abc' 0.959 0.955 0.004 -0.4
s is 4 0.371 0.332 0.039 -10.5
s == 4 0.932 0.922 0.010 -1.1
s > 4 0.936 0.927 0.009 -1.0
s is -1001 0.370 0.330 0.040 -10.8
s == -1001 0.932 0.923 0.009 -1.0
s > -1001 0.935 0.925 0.010 -1.1
s is 34.7 0.368 0.325 0.043 -11.7
s == 34.7 1.110 1.110 0.000 0.0
s > 34.7 1.110 1.110 0.000 0.0
s is 'a b c' 0.370 0.325 0.045 -12.2
s == 'a b c' 0.963 0.948 0.015 -1.6
s > 'a b c' 0.961 0.949 0.012 -1.2
s is True 0.529 0.490 0.039 -7.4
s == True 1.110 1.110 0.000 0.0
s > True 1.120 1.110 0.010 -0.9
s = -1000
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.371 0.326 0.045 -12.1
s == 'abc' 1.200 1.190 0.010 -0.8
s > 'abc' 1.200 1.190 0.010 -0.8
s is 4 0.349 0.350 -0.001 0.3
s == 4 0.347 0.353 -0.006 1.7
s > 4 0.349 0.347 0.002 -0.6
s is -1001 0.348 0.352 -0.004 1.1
s == -1001 0.349 0.352 -0.003 0.9
s > -1001 0.346 0.348 -0.002 0.6
s is 34.7 0.366 0.326 0.040 -10.9
s == 34.7 0.769 0.771 -0.002 0.3
s > 34.7 0.766 0.777 -0.011 1.4
s is 'a b c' 0.367 0.328 0.039 -10.6
s == 'a b c' 1.210 1.190 0.020 -1.7
s > 'a b c' 1.200 1.190 0.010 -0.8
s is True 0.536 0.490 0.046 -8.6
s == True 0.887 0.887 0.000 0.0
s > True 0.890 0.892 -0.002 0.2
s = 34.2
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.369 0.327 0.042 -11.4
s == 'abc' 1.630 1.620 0.010 -0.6
s > 'abc' 1.640 1.620 0.020 -1.2
s is 4 0.372 0.332 0.040 -10.8
s == 4 0.791 0.795 -0.004 0.5
s > 4 0.797 0.798 -0.001 0.1
s is -1001 0.375 0.331 0.044 -11.7
s == -1001 0.792 0.792 0.000 0.0
s > -1001 0.790 0.791 -0.001 0.1
s is 34.7 0.367 0.482 -0.115 31.3
s == 34.7 1.080 0.536 0.544 -50.4
s > 34.7 0.560 0.621 -0.061 10.9
s is 'a b c' 0.387 0.337 0.050 -12.9
s == 'a b c' 1.760 1.710 0.050 -2.8
s > 'a b c' 1.710 1.680 0.030 -1.8
s is True 0.614 0.509 0.105 -17.1
s == True 1.050 1.020 0.030 -2.9
s > True 1.060 1.020 0.040 -3.8
s = 'a b c'
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.379 0.345 0.034 -9.0
s == 'abc' 0.542 0.494 0.048 -8.9
s > 'abc' 0.586 0.593 -0.007 1.2
s is 4 0.430 0.344 0.086 -20.0
s == 4 1.260 1.230 0.030 -2.4
s > 4 1.370 1.230 0.140 -10.2
s is -1001 0.431 0.372 0.059 -13.7
s == -1001 1.250 1.640 -0.390 31.2
s > -1001 1.240 1.260 -0.020 1.6
s is 34.7 0.383 0.337 0.046 -12.0
s == 34.7 1.770 1.680 0.090 -5.1
s > 34.7 1.670 1.660 0.010 -0.6
s is 'a b c' 0.423 0.376 0.047 -11.1
s == 'a b c' 0.506 0.510 -0.004 0.8
s > 'a b c' 0.517 0.564 -0.047 9.1
s is True 0.550 0.514 0.036 -6.5
s == True 1.470 1.640 -0.170 11.6
s > True 1.450 1.430 0.020 -1.4
s = object()
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.389 0.379 0.010 -2.6
s == 'abc' 1.220 1.370 -0.150 12.3
s > 'abc' 1.220 2.600 -1.380 113.1
s is 4 0.427 0.349 0.078 -18.3
s == 4 1.080 1.620 -0.540 50.0
s > 4 1.060 1.070 -0.010 0.9
s is -1001 0.437 0.343 0.094 -21.5
s == -1001 1.070 1.130 -0.060 5.6
s > -1001 1.060 1.090 -0.030 2.8
s is 34.7 0.419 0.338 0.081 -19.3
s == 34.7 1.710 1.520 0.190 -11.1
s > 34.7 1.520 1.540 -0.020 1.3
s is 'a b c' 0.380 0.347 0.033 -8.7
s == 'a b c' 2.020 1.210 0.810 -40.1
s > 'a b c' 1.260 1.210 0.050 -4.0
s is True 0.622 0.515 0.107 -17.2
s == True 1.220 1.220 0.000 0.0
s > True 1.210 1.210 0.000 0.0
s = []
operation before after delta %chg
--------- ------ ----- ----- ----
s is 'abc' 0.369 0.326 0.043 -11.7
s == 'abc' 1.220 1.200 0.020 -1.6
s > 'abc' 1.220 1.200 0.020 -1.6
s is 4 0.372 0.332 0.040 -10.8
s == 4 1.160 1.150 0.010 -0.9
s > 4 1.150 1.150 0.000 0.0
s is -1001 0.371 0.334 0.037 -10.0
s == -1001 1.150 1.140 0.010 -0.9
s > -1001 1.150 1.150 0.000 0.0
s is 34.7 0.368 0.326 0.042 -11.4
s == 34.7 1.500 1.480 0.020 -1.3
s > 34.7 1.490 1.490 0.000 0.0
s is 'a b c' 0.366 0.325 0.041 -11.2
s == 'a b c' 1.220 1.200 0.020 -1.6
s > 'a b c' 1.220 1.200 0.020 -1.6
s is True 0.531 0.484 0.047 -8.9
s == True 1.360 1.350 0.010 -0.7
s > True 1.350 1.350 0.000 0.0
I fully expected that the "is" tests would be faster and without
question the "==" and ">" tests would be slower. I was quite
surprised that this wasn't always the case. The above tests were
run on an 800MHz Powerbook G4 running Mac OSX 10.2.8. I don't
have immediate access in Intel hardware, though I'll try to run
these tests on cygwin this week.
I'd be happy to be shown that my shell script isn't measuring what
I think it's measuring as well.
Skip
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-20 13:27
Message:
Logged In: YES
user_id=80475
Even "is" and "is not" are not helped by more than a couple
of cycles. This fragment essentially inlines part of code
for cmp_outcome(). Only the function call is saved.
It does slow down other code paths by introducing an
unpredictable branch.
If the inlining were considered important, then the whole of
cmp_outcome() should be inlined. Then, all comparisons save
a single call/return pair. The cost is further increasing
the size of the eval loop.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-03-20 12:45
Message:
Logged In: YES
user_id=31435
Well, there's little question that this will speed "is" and "is
not", but it also slows all other cases by the cost of the
switch-and-branch to determine that they're not the favored
cases. So why should we believe that speeding "is" and "is
not" is more important than slowing other cases?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=918462&group_id=5470
More information about the Patches
mailing list