[C++-sig] Re: Support for Indexing and Iteration Safety

Brett Calcott brett.calcott at paradise.net.nz
Thu May 29 13:15:46 CEST 2003


"David Abrahams" <dave at boost-consulting.com> wrote:
>
> I have some ideas about how to better support the indexing operator[]
> and related thoughts about the safety of iterators.  Rather than
> following my usual practice of posting a big tome about it which will
> overwhelm everyone so they don't respond ;->, let me just point out a
> few problems with the current Boost.Python implementation.  You let me
> know if you care about any of these issues:
>
>      1. Unlike most of the other operators, which can be wrapped using
>         "self op other", there's no easy way to wrap the indexing
>         operator.  You end up having to write your own __getitem__
>         and __setitem__ functions.
>
>      2. You also have to implement the code which raises an exception
>         if the index is out of range or the key is not found (in case
>         you're implementing a mapping).  If you forget, your extension
>         class is prone to crashes and accessing invalid data.
>
>      3. Even if you get that right, you may still be prone to crashes.
>         Suppose you're wrapping a std::vector<SomeBigObject>.  If you
>         want to support syntax like:
>
>            >>> foo = vector_SomeBigObject(15)
>            >>> foo[10].bar = 3 # modify an element in-place
>
>         or if you just want to avoid unneccessarily copying the
>         SomeBigObject element when you access it:
>
>            >>> x = foo[10].bar
>
>         then you have to use return_internal_reference in your
>         __getitem__. All well and good. Even if you do:
>
>            >>> del foo
>            >>> print x
>
>         your x is still valid because of the call policy; it is kept
>         alive.  On the other hand, what if you do:
>
>            >>> del foo[10]  # erase the 10th item
>            >>> print x
>
>         In this case it crashes, or prints bogus data, because there's
>         no linkage between modifications to the *state* of foo and
>         the x object.  Another case:
>
>            >>> x = foo[10]
>            >>> foo[10] = y
>            >>> print x, y
>
>         Now you'll see that x and y are always identical, because x
>         refers to the place in the array where you've written the
>         value of y.
>
>      4. Similar problem with exposed iterators:
>
>            >>> for x in some_vec:
>            ...     if x.test_something():
>            ...         x.foo = 3
>            ...     else:
>            ...         some_vec.pop_back()
>
>          Even if your iterator uses return_internal_reference to keep
>          some_vec alive during iteration and allow in-place
>          modification of the data it references, it does nothing to
>          ensure that things are OK when the *contents* of some_vec
>          change out from under you.  The above could lead to a crash
>          or bogus results.
>
> These problems are not vector-specific.  I have some ideas about
> addressing them but I'm not sure how far to go, and first I want to
> know how many people care.
>

I'm haven't gone far enough in what I am doing to encounter these
problems, but for what its worth, at the moment I would have done the
following:

    if something is lightweight, return a copy
    if it is big, use shared_ptr

Which means that the above problems 3 & 4 would not occur. And 1 & 2
don't really seem that onerous a task. So I guess I don't 'care' that
much.

Of course, I reserve the right to change my opinion at any moment if I
encounter problems :)

ciao,
Brett

"David Abrahams" <dave at boost-consulting.com> wrote:
>
> I have some ideas about how to better support the indexing operator[]
> and related thoughts about the safety of iterators.  Rather than
> following my usual practice of posting a big tome about it which will
> overwhelm everyone so they don't respond ;->, let me just point out a
> few problems with the current Boost.Python implementation.  You let me
> know if you care about any of these issues:
>
>      1. Unlike most of the other operators, which can be wrapped using
>         "self op other", there's no easy way to wrap the indexing
>         operator.  You end up having to write your own __getitem__ and
>         __setitem__ functions.
>
>      2. You also have to implement the code which raises an exception
>         if the index is out of range or the key is not found (in case
>         you're implementing a mapping).  If you forget, your extension
>         class is prone to crashes and accessing invalid data.
>
>      3. Even if you get that right, you may still be prone to crashes.
>         Suppose you're wrapping a std::vector<SomeBigObject>.  If you
>         want to support syntax like:
>
>            >>> foo = vector_SomeBigObject(15)
>            >>> foo[10].bar = 3 # modify an element in-place
>
>         or if you just want to avoid unneccessarily copying the
>         SomeBigObject element when you access it:
>
>            >>> x = foo[10].bar
>
>         then you have to use return_internal_reference in your
>         __getitem__. All well and good. Even if you do:
>
>            >>> del foo
>            >>> print x
>
>         your x is still valid because of the call policy; it is kept
>         alive.  On the other hand, what if you do:
>
>            >>> del foo[10]  # erase the 10th item
>            >>> print x
>
>         In this case it crashes, or prints bogus data, because there's
>         no linkage between modifications to the *state* of foo and
>         the x object.  Another case:
>
>            >>> x = foo[10]
>            >>> foo[10] = y
>            >>> print x, y
>
>         Now you'll see that x and y are always identical, because x
>         refers to the place in the array where you've written the
>         value of y.
>
>      4. Similar problem with exposed iterators:
>
>            >>> for x in some_vec:
>            ...     if x.test_something():
>            ...         x.foo = 3
>            ...     else:
>            ...         some_vec.pop_back()
>
>          Even if your iterator uses return_internal_reference to keep
>          some_vec alive during iteration and allow in-place
>          modification of the data it references, it does nothing to
>          ensure that things are OK when the *contents* of some_vec
>          change out from under you.  The above could lead to a crash
>          or bogus results.
>
> These problems are not vector-specific.  I have some ideas about
> addressing them but I'm not sure how far to go, and first I want to
> know how many people care.
>

I'm haven't gone far enough in what I am doing to encounter these
problems, but for what its worth, at the moment I would have done the
following:

    if something is lightweight, return a copy
    if it is big, use shared_ptr

Which means that the above problems 3 & 4 would not occur. And 1 & 2
don't really seem that onerous a task. So I guess I don't 'care' that
much.

Of course, I reserve the right to change my opinion at any moment if I
encounter problems :)

ciao,
Brett







More information about the Cplusplus-sig mailing list