Case-insensitive string equality
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Thu Aug 31 03:10:10 EDT 2017
Three times in the last week the devs where I work accidentally
introduced bugs into our code because of a mistake with case-insensitive
string comparisons. They managed to demonstrate three different failures:
# 1
a = something().upper() # normalise string
... much later on
if a == b.lower(): ...
# 2
a = something().upper()
... much later on
if a == 'maildir': ...
# 3
a = something() # unnormalised
assert 'foo' in a
... much later on
pos = a.find('FOO')
Not every two line function needs to be in the standard library, but I've
come to the conclusion that case-insensitive testing and searches should
be. I've made these mistakes myself at times, as I'm sure most people
have, and I'm tired of writing my own case-insensitive function over and
over again.
So I'd like to propose some additions to 3.7 or 3.8. If the feedback here
is positive, I'll take it to Python-Ideas for the negative feedback :-)
(1) Add a new string method, which performs a case-insensitive equality
test. Here is a potential implementation, written in pure Python:
def equal(self, other):
if self is other:
return True
if not isinstance(other, str):
raise TypeError
if len(self) != len(other):
return False
casefold = str.casefold
for a, b in zip(self, other):
if casefold(a) != casefold(b):
return False
return True
Alternatively: how about a === triple-equals operator to do the same
thing?
(2) Add keyword-only arguments to str.find and str.index:
casefold=False
which does nothing if false (the default), and switches to a case-
insensitive search if true.
Alternatives:
(i) Do nothing. The status quo wins a stalemate.
(ii) Instead of str.find or index, use a regular expression.
This is less discoverable (you need to know regular expressions) and
harder to get right than to just call a string method. Also, I expect
that invoking the re engine just for case insensitivity will be a lot
more expensive than a simple search need be.
(iii) Not every two line function needs to be in the standard library.
Just add this to the top of every module:
def equal(s, t):
return s.casefold() == t.casefold()
That's the status quo wins again. It's an annoyance. A small annoyance,
but multiplied by the sheer number of times it happens, it becomes a
large annoyance. I believe the annoyance factor of case-insensitive
comparisons outweighs the "two line function" objection.
And the two-line "equal" function doesn't solve the problem for find and
index, or for sets dicts, list.index and the `in` operator either.
Unsolved problems:
This proposal doesn't help with sets and dicts, list.index and the `in`
operator either.
Thoughts?
--
Steven D'Aprano
“You are deluded if you think software engineers who can't write
operating systems or applications without security holes, can write
virtualization layers without security holes.” —Theo de Raadt
More information about the Python-list
mailing list