<html><body><div><div><br></div><div>>>> Richard Saunders</div><div>>>> I have been doing some performance experiments with memcmp, and I was</div><div>>>> surprised that memcmp wasn't faster than it was in Python. I did a whole,</div><div>>>> long analysis and came up with some very simple results.</div><div>>></div><div>>>Antoine Pitrou, 20.10.2011 23:08:</div><div>>> Thanks for the analysis. Non-bugfix work now happens on Python 3, where</div><div>>> the str type is Python 2's unicode type. Your recommendations would</div><div>>> have to be revisited under that light.</div><div>></div><div>> Stefan Behnel <stefan_ml@behnel.de></div><div>>Well, Py3 is quite a bit different now that PEP393 is in. It appears to use </div><div>>memcmp() or strcmp() a lot less than before, but I think unicode_compare() </div><div>>should actually receive an optimisation to use a fast memcmp() if both </div><div>>string kinds are equal, at least when their character unit size is less </div><div>>than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has </div><div>>such an optimisation.</div><div><br></div><div>I started looking at the most recent 3.x baseline: a lot of places, </div><div>the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse):</div><div>all still use memcmp in about the same way. But I agree that there are </div><div>some major differences in the unicode portion.</div><div><br></div><div>As long as the two strings are the same unicode "kind", you can use a </div><div>memcmp to compare. In that case, I would almost argue some memcmp</div><div>optimization is even more important: unicode strings are potentially 2</div><div>to 4 times larger, so the amount of time spent in memcmp may be more</div><div>(i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).</div><div><br></div><div>I went ahead a quick string_test3.py for comparing strings</div><div>(similar to what I did in Python 2.7)</div><div><br></div><div># Simple python string comparison test for Python 3.3</div><div>a = []; b = []; c = []; d = []</div><div>for x in range(0,1000) :</div><div> a.append("the quick brown fox"+str(x))</div><div> b.append("the wuick brown fox"+str(x))</div><div> c.append("the quick brown fox"+str(x))</div><div> d.append("the wuick brown fox"+str(x))</div><div>count = 0</div><div>for x in range(0,200000) :</div><div> if a==c : count += 1</div><div> if a==c : count += 2</div><div> if a==d : count += 3</div><div> if b==c : count += 5</div><div> if b==d : count += 7</div><div> if c==d : count += 11</div><div>print(count)</div><div><br></div><div><br></div><div>Timings on On My FC14 machine (Intel Xeon W3520@2.67Ghz):</div><div><br></div><div>29.18 seconds: Vanilla build of Python 3.3 </div><div>29.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp: </div><div><br></div><div>No change: a little investigation shows unicode_compare is where all</div><div>the work is: Here's currently the main loop inside unicode_compare:</div><div><br></div><div><br></div><div> for (i = 0; i < len1 && i < len2; ++i) {</div><div> Py_UCS4 c1, c2;</div><div> c1 = PyUnicode_READ(kind1, data1, i);</div><div> c2 = PyUnicode_READ(kind2, data2, i);</div><div><br></div><div> if (c1 != c2)</div><div> return (c1 < c2) ? -1 : 1;</div><div> }</div><div><br></div><div> return (len1 < len2) ? -1 : (len1 != len2);</div><div><br></div><div>If both loops are the same unicode kind, we can add memcmp</div><div>to unicode_compare for an optimization:</div><div> </div><div> Py_ssize_t len = (len1<len2) ? len1: len2;</div><div><br></div><div> /* use memcmp if both the same kind */</div><div> if (kind1==kind2) {</div><div> int result=memcmp(data1, data2, ((int)kind1)*len);</div><div> if (result!=0) </div><div><span class="Apple-tab-span" style="white-space:pre">        </span>return result<0 ? -1 : +1; </div><div> }</div><div><br></div><div>Rerunning the test with this small change to unicode_compare:</div><div><br></div><div>17.84 seconds: -fno-builtin-memcmp </div><div>36.25 seconds: STANDARD memcmp</div><div><br></div><div>The standard memcmp is WORSE that the original unicode_compare</div><div>code, but if we compile using memcmp with -fno-builtin-memcmp, we get that</div><div>wonderful 2x performance increase again.</div><div><br></div><div>I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...</div><div>(after we put memcmp in unicode_compare)</div><div><br></div><div> Gooday,</div><div><br></div><div> Richie</div><div><br></div></div></body></html>