[C++-sig] Support for Indexing and Iteration Safety

Wed May 28 20:05:26 CEST 2003

I have some ideas about how to better support the indexing operator[]
and related thoughts about the safety of iterators.  Rather than
following my usual practice of posting a big tome about it which will
overwhelm everyone so they don't respond ;->, let me just point out a
few problems with the current Boost.Python implementation.  You let me
know if you care about any of these issues:

     1. Unlike most of the other operators, which can be wrapped using
        "self op other", there's no easy way to wrap the indexing
        operator.  You end up having to write your own __getitem__
        and __setitem__ functions.

     2. You also have to implement the code which raises an exception
        if the index is out of range or the key is not found (in case
        you're implementing a mapping).  If you forget, your extension
        class is prone to crashes and accessing invalid data.

     3. Even if you get that right, you may still be prone to crashes.
        Suppose you're wrapping a std::vector<SomeBigObject>.  If you
        want to support syntax like:

           >>> foo = vector_SomeBigObject(15)
           >>> foo[10].bar = 3 # modify an element in-place

        or if you just want to avoid unneccessarily copying the
        SomeBigObject element when you access it:

           >>> x = foo[10].bar

        then you have to use return_internal_reference in your
        __getitem__. All well and good. Even if you do:

           >>> del foo
           >>> print x

        your x is still valid because of the call policy; it is kept
        alive.  On the other hand, what if you do:

           >>> del foo[10]  # erase the 10th item
           >>> print x

        In this case it crashes, or prints bogus data, because there's
        no linkage between modifications to the *state* of foo and
        the x object.  Another case:

           >>> x = foo[10]
           >>> foo[10] = y
           >>> print x, y

        Now you'll see that x and y are always identical, because x
        refers to the place in the array where you've written the
        value of y.

     4. Similar problem with exposed iterators:

           >>> for x in some_vec:
           ...     if x.test_something():
           ...         x.foo = 3
           ...     else:
           ...         some_vec.pop_back()

         Even if your iterator uses return_internal_reference to keep
         some_vec alive during iteration and allow in-place
         modification of the data it references, it does nothing to
         ensure that things are OK when the *contents* of some_vec
         change out from under you.  The above could lead to a crash
         or bogus results.

These problems are not vector-specific.  I have some ideas about
addressing them but I'm not sure how far to go, and first I want to
know how many people care.

Thanks for reading,
-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com