[Python-Dev] string.join and bad sequences

Mon, 10 Jul 2000 13:24:27 -0400 (EDT)

I added a test case to Lib/test/string_tests.py that uses a sequence
that returns the wrong answer from __len__.  I've used this test in a
number of places to make sure the interpreter doesn't dump core when
it hits a bad user-defined sequence.

class Sequence:
    def __init__(self): self.seq = 'wxyz'
    def __len__(self): return len(self.seq)
    def __getitem__(self, i): return self.seq[i]

class BadSeq2(Sequence):
    def __init__(self): self.seq = ['a', 'b', 'c']
    def __len__(self): return 8

The test of string.join and " ".join don't dump core, but they do
raise an IndexError.  I wonder if that's the right thing to do,
because the other places where it is handled no exception is raised.

The question boils down to the semantics of the sequence protocol.

The string code defintion is:
    if __len__ returns X, then the length is X
    thus, __getitem__ should succeed for range(0, X)
          if it doesn't, raise an IndexError

The other code (e.g. PySequence_Tuple) definition is:
    if __len__ return X, then the length is <= X
    if __getitem__ succeeds for range(0, X), then length is indeed X
    if it does not, then length is Y + 1 for highest Y 
                    where Y is greatest index that actually works

The definition in PySequence_Tuple seemed quite clever when I first
saw it, but I like it less now.  If a user-defined sequence raises
IndexError when len indicates it should not, the code is broken.  The
attempt to continue anyway is masking an error in user code.

I vote for fixing PySequence_Tuple and the like to raise an
IndexError.

Jeremy