[Python-Dev] string.find() again (was Re: timsort for jython)

Samuele Pedroni pedroni@inf.ethz.ch
Tue, 6 Aug 2002 14:54:50 +0200


Thanks for the detailed argument.

[GvR]
> In code that applies to all (or even just some) kinds of sequences,
> the 'in' operator will continue to stand for membership.  This won't
> cause a problem with strings: correct code using 'in' for membership
> will never use seq1 in seq2, it will use item in seq, where the type
> of item is "whatever the type of seq[0] is, if it exists."  When the
> seq is a string, item will be a one-char string -- not a "type" in
> Python's type system, but certainly a useful concept.
> 
> But there's also lots of code that deals only with strings.  This is
> normally be completely clear to the casual reader: either because
> string literals are used, compared, etc., or because values are
> obtained from functions known to return strings (such as
> file.readline()), or because methods unique to strings (e.g. s.lower()
> are used, and so on.  Strings are very important in lots of programs,
> and we want our notations for string operations to be readable and
> expressive.  (Regular expressions are extreme in expressiveness, but
> lack readability, which is why they're relegated to an imported module
> in Python.)  Substring containment testing is a common operation on
> strings, so being able to write it as 's1 in s2' rather than
> 's2.find(s1) >= 0' is a big win, IMO.
> 
> 

My only remark is that this opens the temptation for someone
to subclass say UserList and define "in" as subseq
because it is convenient for the application, for some
value of convenient. And write "seq1 in seq2".
One can generalize saying that it is OK for sequences
that are not full-fledged containers and in particular
do not accept (per contract) subseqs as elements.
All the subtle explanation shows that this is indeed a subtle
point.

Thanks again.

PS: is pure substring testing such a common idiom?
I have not found so many
matches for   find\(.*\)\s*>  in the std lib,
but maybe the re is not general enough or
the std lib is not typical in this respect. Or some
op error.