<html><body><div><div><br></div><div>&gt;&gt;&gt; Richard Saunders</div><div>&gt;&gt;&gt; I have been doing some performance experiments with memcmp, and I was</div><div>&gt;&gt;&gt; surprised that memcmp wasn't faster than it was in Python. I did a whole,</div><div>&gt;&gt;&gt; long analysis and came up with some very simple results.</div><div>&gt;&gt;</div><div>&gt;&gt;Antoine Pitrou, 20.10.2011 23:08:</div><div>&gt;&gt; Thanks for the analysis. Non-bugfix work now happens on Python 3, where</div><div>&gt;&gt; the str type is Python 2's unicode type. Your recommendations would</div><div>&gt;&gt; have to be revisited under that light.</div><div>&gt;</div><div>&gt; Stefan Behnel &lt;stefan_ml@behnel.de&gt;</div><div>&gt;Well, Py3 is quite a bit different now that PEP393 is in. It appears to use&nbsp;</div><div>&gt;memcmp() or strcmp() a lot less than before, but I think unicode_compare()&nbsp;</div><div>&gt;should actually receive an optimisation to use a fast memcmp() if both&nbsp;</div><div>&gt;string kinds are equal, at least when their character unit size is less&nbsp;</div><div>&gt;than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has&nbsp;</div><div>&gt;such an optimisation.</div><div><br></div><div>I started looking at the most recent 3.x baseline: a lot of places,&nbsp;</div><div>the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse):</div><div>all still use memcmp in about the same way. &nbsp;But I agree that there are&nbsp;</div><div>some major differences in the unicode portion.</div><div><br></div><div>As long as the two strings are the same unicode "kind", you can use a&nbsp;</div><div>memcmp to compare. &nbsp;In that case, I would almost argue some memcmp</div><div>optimization is even more important: unicode strings are potentially 2</div><div>to 4 times larger, so the amount of time spent in memcmp may be more</div><div>(i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).</div><div><br></div><div>I went ahead a quick string_test3.py for comparing strings</div><div>(similar to what I did in Python 2.7)</div><div><br></div><div># Simple python string comparison test for Python 3.3</div><div>a = []; b = []; c = []; d = []</div><div>for x in range(0,1000) :</div><div>&nbsp; &nbsp; a.append("the quick brown fox"+str(x))</div><div>&nbsp; &nbsp; b.append("the wuick brown fox"+str(x))</div><div>&nbsp; &nbsp; c.append("the quick brown fox"+str(x))</div><div>&nbsp; &nbsp; d.append("the wuick brown fox"+str(x))</div><div>count = 0</div><div>for x in range(0,200000) :</div><div>&nbsp; &nbsp; if a==c : count += 1</div><div>&nbsp; &nbsp; if a==c : count += 2</div><div>&nbsp; &nbsp; if a==d : count += 3</div><div>&nbsp; &nbsp; if b==c : count += 5</div><div>&nbsp; &nbsp; if b==d : count += 7</div><div>&nbsp; &nbsp; if c==d : count += 11</div><div>print(count)</div><div><br></div><div><br></div><div>Timings on On My FC14 machine (Intel Xeon W3520@2.67Ghz):</div><div><br></div><div>29.18 seconds: &nbsp;Vanilla build of Python 3.3&nbsp;</div><div>29.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp:&nbsp;</div><div><br></div><div>No change: a little investigation shows unicode_compare is where all</div><div>the work is: Here's currently the main loop inside unicode_compare:</div><div><br></div><div><br></div><div>&nbsp; &nbsp; for (i = 0; i &lt; len1 &amp;&amp; i &lt; len2; ++i) {</div><div>&nbsp; &nbsp; &nbsp; &nbsp; Py_UCS4 c1, c2;</div><div>&nbsp; &nbsp; &nbsp; &nbsp; c1 = PyUnicode_READ(kind1, data1, i);</div><div>&nbsp; &nbsp; &nbsp; &nbsp; c2 = PyUnicode_READ(kind2, data2, i);</div><div><br></div><div>&nbsp; &nbsp; &nbsp; &nbsp; if (c1 != c2)</div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return (c1 &lt; c2) ? -1 : 1;</div><div>&nbsp; &nbsp; }</div><div><br></div><div>&nbsp; &nbsp; return (len1 &lt; len2) ? -1 : (len1 != len2);</div><div><br></div><div>If both loops are the same unicode kind, we can add memcmp</div><div>to unicode_compare for an optimization:</div><div>&nbsp;&nbsp;</div><div>&nbsp; &nbsp; Py_ssize_t len = (len1&lt;len2) ? len1: len2;</div><div><br></div><div>&nbsp; &nbsp; /* use memcmp if both the same kind */</div><div>&nbsp; &nbsp; if (kind1==kind2) {</div><div>&nbsp; &nbsp; &nbsp; int result=memcmp(data1, data2, ((int)kind1)*len);</div><div>&nbsp; &nbsp; &nbsp; if (result!=0)&nbsp;</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>return result&lt;0 ? -1 : +1;&nbsp;</div><div>&nbsp; &nbsp; }</div><div><br></div><div>Rerunning the test with this small change to unicode_compare:</div><div><br></div><div>17.84 seconds: &nbsp;-fno-builtin-memcmp&nbsp;</div><div>36.25 seconds: &nbsp;STANDARD memcmp</div><div><br></div><div>The standard memcmp is WORSE that the original unicode_compare</div><div>code, but if we compile using memcmp with -fno-builtin-memcmp, we get that</div><div>wonderful 2x performance increase again.</div><div><br></div><div>I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...</div><div>(after we put memcmp in unicode_compare)</div><div><br></div><div>&nbsp; Gooday,</div><div><br></div><div>&nbsp; Richie</div><div><br></div></div></body></html>