"in" operator for strings

Alex Martelli aleaxit at yahoo.com
Mon Feb 5 08:28:56 EST 2001


"Magnus Lie Hetland" <mlh at idi.ntnu.no> wrote in message
news:95jqlu$lcu$1 at tyfon.itea.ntnu.no...
    [snip]
> the issue. It would seem to me that you agree on that
> point? (So why object? <wink>)

Because we ARE doing *object*-oriented design & coding
here...?

When you're dealing with subsets, there are no big
problems, because the fact that what you're modeling
is order-insensitive means it's OK to inject whatever
order you like behind the cover.  Call it a canonical
order, if you like.  The key issue is that it gives
you the isomorphism between "subsets of an N-elements
alphabet" and "binary numbers in base N", which I
used for the non-necessarily-contiguous-subsequences
case.  Weed out duplicates (at insertion time or on
the fly when needed), sort (ditto), and enjoy the
utter simplicity we already saw for n-n-c subsequences.
("The secret of life is finding the right isomorphisms",
to paraphrase the late C. Schulz -- his original quote
was actually "is getting the right side-effects", but
they aren't that far if you squint at them just right).

Sequences WITH contiguousness constraints (and, probably
even more complicated case, with *relaxed* contiguousness
constraints, as you hinted at -- "elements following
each other at linear distance no more than four", etc)
is a far more complicated case -- as are multisets in
the non-order case.  Heuristics can be found (e.g., if
in most cases your multisets have a multiplicity no more
than 27 in each given element, you can use base-29
numbers as the primary representation of any multiset,
with 28 standing in for "28 or more [rare case]", and
get MOST solutions almost as fast as for sets represented
as binary numbers), but the general-case is another thing
(not only is it hard to look for closed-formula enumerators
of such things, but proving "no closed formula exists" is
a real bear too).


Alex






More information about the Python-list mailing list