[C++-sig] Re: Support for Indexing and Iteration Safety
Brett Calcott
brett.calcott at paradise.net.nz
Thu May 29 13:15:46 CEST 2003
"David Abrahams" <dave at boost-consulting.com> wrote:
>
> I have some ideas about how to better support the indexing operator[]
> and related thoughts about the safety of iterators. Rather than
> following my usual practice of posting a big tome about it which will
> overwhelm everyone so they don't respond ;->, let me just point out a
> few problems with the current Boost.Python implementation. You let me
> know if you care about any of these issues:
>
> 1. Unlike most of the other operators, which can be wrapped using
> "self op other", there's no easy way to wrap the indexing
> operator. You end up having to write your own __getitem__
> and __setitem__ functions.
>
> 2. You also have to implement the code which raises an exception
> if the index is out of range or the key is not found (in case
> you're implementing a mapping). If you forget, your extension
> class is prone to crashes and accessing invalid data.
>
> 3. Even if you get that right, you may still be prone to crashes.
> Suppose you're wrapping a std::vector<SomeBigObject>. If you
> want to support syntax like:
>
> >>> foo = vector_SomeBigObject(15)
> >>> foo[10].bar = 3 # modify an element in-place
>
> or if you just want to avoid unneccessarily copying the
> SomeBigObject element when you access it:
>
> >>> x = foo[10].bar
>
> then you have to use return_internal_reference in your
> __getitem__. All well and good. Even if you do:
>
> >>> del foo
> >>> print x
>
> your x is still valid because of the call policy; it is kept
> alive. On the other hand, what if you do:
>
> >>> del foo[10] # erase the 10th item
> >>> print x
>
> In this case it crashes, or prints bogus data, because there's
> no linkage between modifications to the *state* of foo and
> the x object. Another case:
>
> >>> x = foo[10]
> >>> foo[10] = y
> >>> print x, y
>
> Now you'll see that x and y are always identical, because x
> refers to the place in the array where you've written the
> value of y.
>
> 4. Similar problem with exposed iterators:
>
> >>> for x in some_vec:
> ... if x.test_something():
> ... x.foo = 3
> ... else:
> ... some_vec.pop_back()
>
> Even if your iterator uses return_internal_reference to keep
> some_vec alive during iteration and allow in-place
> modification of the data it references, it does nothing to
> ensure that things are OK when the *contents* of some_vec
> change out from under you. The above could lead to a crash
> or bogus results.
>
> These problems are not vector-specific. I have some ideas about
> addressing them but I'm not sure how far to go, and first I want to
> know how many people care.
>
I'm haven't gone far enough in what I am doing to encounter these
problems, but for what its worth, at the moment I would have done the
following:
if something is lightweight, return a copy
if it is big, use shared_ptr
Which means that the above problems 3 & 4 would not occur. And 1 & 2
don't really seem that onerous a task. So I guess I don't 'care' that
much.
Of course, I reserve the right to change my opinion at any moment if I
encounter problems :)
ciao,
Brett
"David Abrahams" <dave at boost-consulting.com> wrote:
>
> I have some ideas about how to better support the indexing operator[]
> and related thoughts about the safety of iterators. Rather than
> following my usual practice of posting a big tome about it which will
> overwhelm everyone so they don't respond ;->, let me just point out a
> few problems with the current Boost.Python implementation. You let me
> know if you care about any of these issues:
>
> 1. Unlike most of the other operators, which can be wrapped using
> "self op other", there's no easy way to wrap the indexing
> operator. You end up having to write your own __getitem__ and
> __setitem__ functions.
>
> 2. You also have to implement the code which raises an exception
> if the index is out of range or the key is not found (in case
> you're implementing a mapping). If you forget, your extension
> class is prone to crashes and accessing invalid data.
>
> 3. Even if you get that right, you may still be prone to crashes.
> Suppose you're wrapping a std::vector<SomeBigObject>. If you
> want to support syntax like:
>
> >>> foo = vector_SomeBigObject(15)
> >>> foo[10].bar = 3 # modify an element in-place
>
> or if you just want to avoid unneccessarily copying the
> SomeBigObject element when you access it:
>
> >>> x = foo[10].bar
>
> then you have to use return_internal_reference in your
> __getitem__. All well and good. Even if you do:
>
> >>> del foo
> >>> print x
>
> your x is still valid because of the call policy; it is kept
> alive. On the other hand, what if you do:
>
> >>> del foo[10] # erase the 10th item
> >>> print x
>
> In this case it crashes, or prints bogus data, because there's
> no linkage between modifications to the *state* of foo and
> the x object. Another case:
>
> >>> x = foo[10]
> >>> foo[10] = y
> >>> print x, y
>
> Now you'll see that x and y are always identical, because x
> refers to the place in the array where you've written the
> value of y.
>
> 4. Similar problem with exposed iterators:
>
> >>> for x in some_vec:
> ... if x.test_something():
> ... x.foo = 3
> ... else:
> ... some_vec.pop_back()
>
> Even if your iterator uses return_internal_reference to keep
> some_vec alive during iteration and allow in-place
> modification of the data it references, it does nothing to
> ensure that things are OK when the *contents* of some_vec
> change out from under you. The above could lead to a crash
> or bogus results.
>
> These problems are not vector-specific. I have some ideas about
> addressing them but I'm not sure how far to go, and first I want to
> know how many people care.
>
I'm haven't gone far enough in what I am doing to encounter these
problems, but for what its worth, at the moment I would have done the
following:
if something is lightweight, return a copy
if it is big, use shared_ptr
Which means that the above problems 3 & 4 would not occur. And 1 & 2
don't really seem that onerous a task. So I guess I don't 'care' that
much.
Of course, I reserve the right to change my opinion at any moment if I
encounter problems :)
ciao,
Brett
More information about the Cplusplus-sig
mailing list