Assigning generator expressions to ctype arrays

Patrick Maupin pmaupin at gmail.com
Sat Oct 29 10:43:34 EDT 2011


On Oct 28, 3:24 pm, Terry Reedy <tjre... at udel.edu> wrote:
> On 10/28/2011 2:05 PM, Patrick Maupin wrote:
>
> > On Oct 27, 10:23 pm, Terry Reedy<tjre... at udel.edu>  wrote:
> >> I do not think everyone else should suffer substantial increase in space
> >> and run time to avoid surprising you.
>
> > What substantial increase?
>
> of time and space, as I said, for the temporary array that I think would
> be needed and which I also described in the previous paragraph that you
> clipped

That's because I don't think it needs a temporary array.  A temporary
array would provide some invariant guarantees that are nice but not
necessary in a lot of real-world cases.
>
> >  There's already a check that winds up
> > raising an exception.  Just make it empty an iterator instead.
>
> It? I have no idea what you intend that to refer to.

Sorry, code path.

There is already a "code path" that says "hey I can't handle this."
To modify this code path to handle the case of a generic iterable
would add a tiny bit of code, but would not add any appreciable space
(no temp array needed for my proposal) and would not add any runtime
to people who are not passing in iterables or doing other things that
currently raise exceptions.

> I doubt it would be very many because it is *impossible* to make it work
> in the way that I think people would want it to.

How do you know?  I have a use case that I really don't think is all
that rare.  I know exactly how much data I am generating, but I am
generating it piecemeal using iterators.

> >> It could, but at some cost. Remember, people use ctypes for efficiency,
> > yes, you just made my argument for me.  Thank you.  It is incredibly
> > inefficient to have to create a temp array.

No, I don't think I did "make your argument for you."  I am currently
making a temp list because I have to, and am proposing that with a
small change to the ctypes library, that wouldn't always need to be
done.

> But necessary to work with blank box iterators.

With your own preconceived set of assumptions. (Which I will admit,
you have given quite a bit of thought to, which I appreciate.)

> Now you are agreeing with my argument.

Nope, still not doing that.

> If ctype_array slice assignment were to be augmented to work with
> iterators, that would, in my opinion (and see below),

That's better for not being absolute.  Thank you for admitting other
possibilities.

> require use of
> temporary arrays. Since slice assignment does not use temporary arrays
> now (see below), that augmentation should be conditional on the source
> type being a non-sequence iterator.

I don't think any temporary array is required, but in any case, yes
the code path through the ctypes array library __setslice__ would have
to be modified where it gives up now, in order to decide to do
something different if it is passed an iterable.

> CPython comes with immutable fixed-length arrays (tuples) that do not
> allow slice assignment and mutable variable-length arrays (lists) that
> do. The definition is 'replace the indicated slice with a new slice
> built from all values from an iterable'. Point 1: This works for any
> properly functioning iterable that produces any finite number of items.

Agreed.

> Iterators are always exhausted.

And my proposal would continue to exhaust iterators, or would raise an
exception if the iterator wasn't exhausted.

> Replace can be thought of as delete follewed by add, but the
> implementation is not that naive.

Sure, on a mutable length item.

> Point 2: If anything goes wrong and an
> exception is raised, the list is unchanged.

This may be true on lists, and is quite often true (and is nice when
it happens), but it isn't always true in general.  For example, with
the current tuple packing/unpacking protocol across an assignment, the
only real guarantee is that everything is gathered up into a single
object before the assignment is done.  It is not the case that nothing
will be unpacked unless everything can be unpacked.  For example:

>>>
>>> a,b,c,d,e,f,g,h,i = range(100,109)
>>> (a,b,c,d), (e,f), (g,h,i) = (1,2,3,4), (5,6,7), (8,9)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
>>> a,b,c,d,e,f,g,h,i
(1, 2, 3, 4, 104, 105, 106, 107, 108)
>>>

> This means that there must
> be temporary internal storage of either old or new references.

As I show with the tuple unpacking example, it is not an inviolate law
that Python won't unpack a little unless it can unpack everything.

> An
> example that uses an improperly functioning generator. (snip)

Yes, I agree that lists are wondrously robust.  But one of the reasons
for this is the "flexible" interpretation of slice start and end
points, that can be as surprising to a beginner as anything I'm
proposing.

> A c_uint array is a new kind of beast: a fixed-length mutable
> array. So it has to have a different definition of slice
> assignment than lists.  Thomas Heller, the ctypes author,
> apparently chose 'replacement by a sequence with exactly
> the same number of items, else raise an exception'. though
> I do not know what the doc actually says.

Yes, but ctypes was designed and developed before generator
expressions were available, and before or contemporaneously with the
first cut of itertools.  We arguably use Python differently than we
did in those days.

> An alternative definition would have been to replace as much of the
> slice as possible, from the beginning, while ignoring any items in
> excess of the slice length. This would work with any iterable.

I think that an iterable that doesn't match the slice length should be
an error condition and raise an exception.  Both for too much data and
too little data.

> However, partial replacement of a slice would be a surprising innovation to most.

Yes, but when an exception is raised it doesn't always mean that
nothing got replaced.  See my tuple unpacking example earlier.

> The current implementation assumes that the reported length of a
> sequence matches the valid indexes and dispenses with temporary storage. This is shown by the following: (snip)
> I consider such unintended partial replacement to be a glitch.

Now that's actually interesting.  I agree with you that it's not
desired behavior to not raise an exception.  OTOH, exploiting this
misfeature might actually increase performance for my specific case.

> An
> exception could be raised, but without adding temp storage, the array
> could not be restored. And making a change *and* raising an exception
> would be a different sort of glitch. (One possible with augmented
> assignment involving a mutable member of a tuple.)

It's also possible with non-augmented assignments with immutable
tuples, as I showed above.

> So I would leave this as undefined behavior for an input
> outside the proper domain of the function.

Not sure what you mean by "this."

I don't think that the interpreter should always paternalistically say
"no, you can't assign an item that doesn't have a __len__ attribute
because you obviously don't know what you're doing if you're trying to
do that."  I think the interpreter should do the same as it does on my
tuple unpacking example -- try to do the right thing, and raise an
exception if it fails during the process.

> Anyway, as I said before, you are free to propose a specific change
> ('work with iterators' is too vague) and provide a corresponding patch.

I will try to see if I can do that some time in the next few months,
if I ever get out of crunch mode.

Thanks,
Pat



More information about the Python-list mailing list