[issue21449] Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual

Josh Rosenberg report at bugs.python.org
Tue May 6 23:35:24 CEST 2014


New submission from Josh Rosenberg:

_PyUnicode_CompareWithId is used exclusively for equality comparisons (after all, identifiers aren't really sortable in a meaningful way; they're isolated values, not a continuum). But because _PyUnicode_CompareWithId maintains the general comparison behavior, not just ==/!=, it serves little purpose; while it checks the return of _PyUnicode_FromId, none of its callers check for failure anyway, so every use could just as well have been:

PyUnicode_Compare(left, _PyUnicode_FromId(right));

I've attached a patch that replaces _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual, that:

1. Only check equality vs. inequality
2. Can optimize for the case where left is an interned string by performing direct pointer comparison
3. Even when left is not interned, it can use the optimized unicode_compare_eq worker function instead of the slower generalized unicode_compare function

I've replaced all the uses of the old function I could find, and all unit tests pass. I don't expect to see any meaningful speed ups as a result of the change (the most commonly traversed code that would benefit appears to be the code that creates new classes, and the code that creates reprs for objects), but the goal here is not immediate speed ups, but enabling future speed ups.

I am looking into writing a PyDict_GetItem fastpath for looking up identifiers (that would remove the need to perform memory comparisons when the dictionary, as in keyword argument passing, is usually composed of interned keys), possibly in combination with making an identifier based version of PyArg_ParseTupleAndKeywords; with ArgumentClinic, it might become practical to swap in a new argument parser without having to manually change thousands of lines of code, and one of the simplest ways to improve speed would be to remove the overhead of constantly constructing, hashing, and comparing the same keyword strings every time a C function is called.

Adding haypo as nosy since he created the original function in #19512.

----------
files: comparewithidequals.patch
keywords: patch
messages: 218022
nosy: haypo, josh.rosenberg
priority: normal
severity: normal
status: open
title: Replace _PyUnicode_CompareWithId with _PyUnicode_CompareWithIdEqual
Added file: http://bugs.python.org/file35163/comparewithidequals.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21449>
_______________________________________


More information about the Python-bugs-list mailing list