[Python-Dev] why we have both re.match and re.string?
Steven D'Aprano
steve at pearwood.info
Wed Feb 10 18:05:51 EST 2016
On Wed, Feb 10, 2016 at 10:59:18PM +0100, Luca Sangiacomo wrote:
> Hi,
> I hope the question is not too silly, but why I would like to understand
> the advantages of having both re.match() and re.search(). Wouldn't be
> more clear to have just one function with one additional parameters like
> this:
>
> re.search(regexp, text, from_beginning=True|False) ?
I guess the most important reason now is backwards compatibility. The
oldest Python I have installed here is version 1.5, and it has the brand
new "re" module (intended as a replacement for the old "regex" module).
Both have search() and match() top-level functions. So my guess is that
you would have to track down the author of the original "regex" module.
But a more general answer is the principle, "Functions shouldn't take
constant bool arguments". It is an API design principle which (if I
remember correctly) Guido has stated a number of times. Functions should
not take a boolean argument which (1) exists only to select between two
different modes and (2) are nearly always given as a constant.
Do you ever find yourself writing code like this?
if some_calculation():
result = re.match(regex, string)
else:
result = re.search(regex, string)
If you do, that would be a hint that perhaps match() and search() should
be combined so you can write:
result = re.search(regex, string, some_calculation())
But I expect that you almost never do. I would expect that if we
combined the two functions into one, we would nearly always call them
with a constant bool:
# I always forget whether True means match from the start or not,
# and which is the default...
result = re.search(regex, string, False)
which suggests that search() is actually two different functions, and
should be split into two, just as we have now.
It's a general principle, not a law of nature, so you may find
exceptions in the standard library. But if I were designing the re
module from scratch, I would either keep the two distinct functions, or
just provide search() and let users use ^ to anchor the search to the
beginning.
> In this way we prevent, as written in the documentation, people writing
> ".*" in front of the regexp used with re.match()
I only see one example that does that:
https://docs.python.org/3/library/re.html#checking-for-a-pair
Perhaps it should be changed.
--
Steve
More information about the Python-Dev
mailing list