"in" operator for strings

Fri Feb 2 08:03:41 EST 2001

"Alex Martelli" <aleaxit at yahoo.com> wrote in message
news:95birk0r0d at news1.newsguy.com...
> "Magnus Lie Hetland" <mlh at idi.ntnu.no> wrote in message
> news:95beco$e0u$1 at tyfon.itea.ntnu.no...
[...]
> Right, and an extension of this is basically what's being
> asked for (though the original poster may not have thought
> of this 'obvious' generalization, specialcasing string would
> surely not be warranted).  Unfortunately, for general cases
> it doesn't scale well -- i.e., now:

Well... Then "in" would test for both membership and
subsequenceness... Quite a strong case of ambiguity if you
ask me. In my opinion a really *bad* idea. (And would it
only allow contiguous subsequences, or any subsequence?
If we ever get built-in sets contiguity would be meningless,
as would it be if we get "in" tests for dictionaries.)

[...]
> and having it return 1 in the second case too would be making
> this 'in' very ambiguous and confusing, alas.

Right.

> Also, of course, this would throw any parallel between
> "x in y" and "for x in y" out of the windows unless the
> latter starts looping on all *subsequences* -- eeep!-)

Actually, that sounds very interesting ;-)

How about having a power-sequence-function (like the standard
power-set function in mathematics)? Then pow(seq) or seq.pow()
would return a (perhaps lazy) sequence containing all
subsequences... And one could have seq.pow(contiguous=1) if
one only wanted contiguous subsequences... Then one would
have:

  >>> "Waldo" in "Ralph Waldo Emerson".pow(contiguous=1)
  1

An ingenious implementation of the laziness would be needed
I guess... Or maybe not. A simple string-matching algorithm
would be needed for the contiguous case, and a O(n*m)
exhaustive search for the non-contiguous case. (Or perhaps
something better would be possible by putting the elements
in a hash table... O(n+m)? Might have quite some overhead,
perhaps... Oh, well)

>
> > But you can't do what you ask for, just like you can't write
> >
> >   [1, 2] in [1, 2, 3, 4]
>
> Sure you can, it's a well-formed test and returns 0 since
> [1,2] is not an item in the right-hand operand sequence.

Right. That's essentially what I meant. You can't write the
above to find out whether [1, 2] is a subsequence of
[1, 2, 3, 4].

> > Probably a better idea.
>
> Only if you're looking for words, not for any substring,
> which is at least as frequent.

You are right. But it's a better idea than doing something
that doesn't work, at least <0.8 wink>

> >    "Waldo" in split("Ralph Waldo Emerson")
> >
> > It might be old-fashioned, but... So what :-)
>
> So it doesn't work unless you "from string import *" (horrid
> idea), "from string import split" (doubtful), or rewrite it
> using an explicit string.split (probably best,

Why? This would in my opinion clearly depend on the size of
the script... If you don't expect another split function to
appear, I think it's quite OK to use

  from string import split

I mean -- I'm not against string methods. They're nice. I
just get a bit woozy seeing them called on literals. It
just gives me flashes of weird stuff like

  1.plus(2)

And... In my mind methods seem to be for modifying an
objects state first of all. Oh, well. Here we go -- another
pointless discussion. Sorry :)

> For general substring-matching, a class wrapper is not
> too bad:

[...]
> this only works for strings, as written, AND only to
> enable such idioms as
>
>     if 'ald' in subsOf("Waldo"):
>         print 'yep!'

I should have read the entire posting before replying, I
see. <wink>

This is the same as my power-sequence idea I guess.

> Alex

--

  Magnus Lie Hetland      (magnus at hetland dot org)

 "Reality is what refuses to disappear when you stop
  believing in it"                 -- Philip K. Dick