RE: FW: [Numpy-discussion] Bug: extremely misleading array behavior
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
[I thought I replied yesterday, but somehow that apparently vanished.] <Konrad Hinsen writes>:
For binary operations between a Python scalar and array, there is no coercion performed on the array type if the scalar is of the same kind as the array (but not same size or precision). For example (assuming ints happen to be 32 bit in this case) Python Int (Int32) * Int16 array --> Int16 array Python Float (Float64) * Float32 array --> Float32 array. But if the Python scalar is of a higher kind, e.g., Python float scalar with Int array, then the array is coerced to the corresponding type of the Python scalar. Python Float (Float64) * Int16 array --> Float64 array. Python Complex (Complex64) * Float32 array --> Complex64 array. Numarray basically has the same coercion rules as Numeric when two arrays are involved (there are some extra twists such as: UInt16 array * Int16 array --> Int32 array since neither input type is a proper subset of the other. (But since Numeric doesn't (or didn't until Travis changed that) have unsigned types, that wouldn't have been an issue with Numeric.)
Certainly. I didn't mean to minimize that. But the current coercion rules have produced a demand for solutions to the problem of upcasting, and I consider those solutions to be less than ideal (savespace and rank-0 arrays). If people really are troubled by these warts, I'm arguing that the real solution is in changing the coercion behavior. (Yes, it would be easiest to deal with if Python had all these types, but I think that will never happen, nor should it happen.) Perry
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
That solves one problem and creates another... Two, in fact. One is the inconsistency problem: Python type coercion always promotes "smaller" to "bigger" types, it would be good to make no exceptions from this rule. Besides, there are still situations in which types, ranks, and indexing operations depend on each other in a strange way. With a = array([1., 2.], Float) b = array([3., 4.], Float32) the result of a*b is of type Float, whereas a[0]*b is of type Float32 - if and only if a has rank 1.
(Yes, it would be easiest to deal with if Python had all these types, but I think that will never happen, nor should it happen.)
Python doesn't need to have them as standard types, an add-on package can provide them as well. NumPy seems like the obvious one. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
All this is true. It really comes down to which poison you prefer. Neither choice is perfect. Changing the coercion rules results in the inconsistencies you mention. Not changing them results in the existing inconsistencies recently discussed (and still doesn't remove the difficulties of dealing with scalars in expressions without awkward constructs). We think the inconsistencies you point out are easier to live with than the existing behavior. It would be nice to have a solution that had none of these problems, but that doesn't appear to be possible. Perry
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
It would be nice to have a solution that had none of these problems, but that doesn't appear to be possible.
I still believe that the best solution is to define scalar data types corresponding to all array element types. As far as I can see, this doesn't have any of the disadvantages of the other solutions that have been proposed until now. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Konrad Hinsen writes>:
If x was a Float32 array how would the following not be promoted to a Float64 array y = x + 1. If you are proposing something like y = x + Float32(1.) it would work, but it sure leads to some awkward expressions. Perry
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Yes, that's what I am proposing. It's no worse than what we have now, and if writing Float32 a hundred times is too much effort, an abbreviation like f = Float32 helps a lot. Anyway, following the Python credo "explicit is better than implicit", I'd rather write explicit type conversions than have automagical ones surprise me. Finally, we can always lobby for inclusion of the new scalar types into the core interpreter, with a corresponding syntax for literals, but it would sure help if we could show that the system works and suffers only from the lack of literals. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
I did not receive any major objections, and so I have released a new Numeric (21.3) incorporating bug fixes. I also tagged the CVS tree with VERSION_21_3, and then I incorporated the unsigned integers and unsigned shorts into the CVS version of Numeric, for inclusion in a tentatively named version 22.0 I've only uploaded a platform independent tar file for 21.3. Any binaries need to be updated. If you are interested in testing the new additions, please let me know of any bugs you find. Thanks, -Travis O.
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
How about making indexing (not slicing) arrays *always* return a 0-D array with copy instead of "view" semantics? This is nearly equivalent to creating a new scalar type, but without requiring major changes. I think it is probably even more useful for writing generic code because the returned value with retain array behavior. Also, the following example
would now return a Float array as Konrad desires because a[0] is a Float array. Using copy semantics would fix the unexpected behavior reported by Larry that kicked off this discussion. Slices are a different animal than indexing that would (and definitely should) continue to return view semantics. I further believe that all Numeric functions (sum, product, etc.) should return arrays all the time instead of converting implicitly converting them to Python scalars in special cases such as reductions of 1d arrays. I think the only reason for the silent conversion is that Python lists only allow integer values for use in indexing so that:
Numeric arrays don't have this problem:
I don't think this alone is a strong enough reason for the conversion. Getting rid of special cases is more important because it makes behavior predictable to the novice (and expert), and it is easier to write generic functions and be sure they will not break a year from now when one of the special cases occurs. Are there other reasons why scalars are returned? On coercion rules: As for adding the array to a scalar value, x = array([3., 4.], Float32) y = x + 1. Should y be a Float or a Float32? I like numarray's coercion rules better (Float32). I have run into this upcasting to many times to count. Explicit and implicit aren't obvious to me here. The user explicitly cast x to be Float32, but because of the limited numeric types in Python, the result is upcast to a double. Here's another example,
I had to stare at this for a while when I first saw it before I realized the integer value 3 upcast the result to be type 'i'. So, I think this is confusing and rarely the desired behavior. The fact that this is inconsistent with Python's "always upcast" rule is minor for me. The array math operations are necessarily a different animal from scalar operations because of the extra types supported. Defining these operations in a way that is most convenient for working with array data seems OK. On the other hand, I don't think a jump from 21 to 22 is enough of a jump to make such a change. Numeric progresses pretty fast, and users don't expect such a major shift in behavior. I do think, though, that the computational speed issue is going to result in numarray and Numeric existing side-by-side for a long time. Perhaps we should think create an "interim" Numeric version (maybe starting at 30), that tries to be compatible with the upcoming numarray, in its coercion rules, etc? Advanced features such as indexing arrays with arrays, memory mapped arrays, floating point exception behavior, etc. won't be there, but it should help people transition their codes to work with numarray, and also offer a speedy alternative. A second choice would be to make SciPy's Numeric implementation the intermediate step. It already produces NaN's during div-by-zero exceptions according to numarray's rules. The coercion modifications could also be incorporated.
There was a seriously considered debate last year about unifying Python's numeric model into a single type to get rid of the integer-float distinction, at last year's Python conference and the ensuing months. While it didn't (and won't) happen, I'd be real surprised if the general community would welcome us suggesting stirring yet another type into the brew. Can't we make 0-d arrays work as an alternative? eric
Konrad. --
------------------------------------------------------------------------ --
------------------------------------------------------------------------ --
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
I think this was discussed as well a long time ago. For pure Python code, this would be a very good solution. But
I think the only reason for the silent conversion is that Python lists only allow integer values for use in indexing so that:
There are some more cases where the type matters. If you call C routines that do argument parsing via PyArg_ParseTuple and expect a float argument, a rank-0 float array will raise a TypeError. All the functions from the math module work like that, and of course many in various extension modules. In the ideal world, there would not be any distinction between scalars and rank-0 arrays. But I don't think we'll get there soon.
Statistically they probably give the desired result in more cases. But they are in contradiction to Python principles, and consistency counts a lot on my value scale. I propose an experiment: ask a few Python programmers who are not using NumPy what type they would expect for the result. I bet that not a single one would answer "Float32".
On the other hand, I don't think a jump from 21 to 22 is enough of a jump to make such a change. Numeric progresses pretty fast, and users
I don't think any increase in version number is enough for incompatible changes. For many users, NumPy is just a building block, they install it because some other package(s) require it. If a new version breaks those other packages, they won't be happy. The authors of those packages won't be happy either, as they will get the angry letters. As an author of such packages, I am speaking from experience. I have even considered to make my own NumPy distribution under a different name, just to be safe from changes in NumPy that break my code (in the past it was mostly the installation code that was broken when arrayobject.h changed its location). In my opinion, anything that is not compatible with Numeric should not be called Numeric. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
On Mon, 2002-06-10 at 11:08, Konrad Hinsen wrote:
Actually, the code in PyArg_ParseTuple asks the object it gets if it knows how to be a float. 0-d arrays for some time have known how to be Python floats. So, I do not think this error occurs as you've described. Could you demonstrate this error? In fact most of the code in Python itself which needs scalars allows arbitrary objects provided the object has defined functions which return a Python scalar. The only exception to this that I've seen is the list indexing code (probably for optimization purposes). There could be more places, but I have not found them or heard of them. Originally Numeric arrays did not define appropriate functions for 0-d arrays to act like scalars in the right places. For quite a while, they have now. I'm quite supportive of never returning Python scalars from Numeric array operations unless specifically requested (e.g. the toscalar method).
I'm not sure I agree with that at all. On what reasoning is that presumption based? If I encounter a Python object that I'm unfamiliar with, I don't presume to know how it will define multiplication.
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Travis Oliphant <oliphant.travis@ieee.org> writes:
No, it seems gone indeed. I remember a lengthy battle due to this problem, but that was a long time ago.
Even for indexing, I don't see the point. If you test for the int type and do conversion attempts only for non-ints, that shouldn't slow down normal usage at all.
I suppose this would be easy to implement, right? Then why not do it in a test release and find out empirically how much code it breaks.
presumption based? If I encounter a Python object that I'm unfamiliar with, I don't presume to know how it will define multiplication.
But if that object pretends to be a number type, a sequence type, a mapping type, etc., I do make assumptions about its behaviour. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/56475d2e8acb48b4308f609982f94440.jpg?s=120&d=mm&r=g)
We have certainly beaten this topic to death in the past. It keeps coming up because there is no good way around it. Two points about the x + 1.0 issue: 1. How often this occurs is really a function of what you are doing. For those using Numeric Python as a kind of MATLAB clone, who are typing interactively, the size issue is of less importance and the easy expression is of more importance. To those writing scripts to batch process or writing steered applications, the size issue is more important and the easy expression less important. I'm using words like less and more here because both issues matter to everyone at some time, it is just a question of relative frequency of concern. 2. Part of what I had in mind with the kinds module proposal PEP 0242 was dealing with the literal issue. There had been some proposals to make literals decimal numbers or rationals, and that got me thinking about how to defend myself if they did it, and also about the fact that Python doesn't have Fortran's kind concept which you can use to gain a more platform-independent calculation.
From the PEP this example
In module myprecision.py: import kinds tinyint = kinds.int_kind(1) single = kinds.float_kind(6, 90) double = kinds.float_kind(15, 300) csingle = kinds.complex_kind(6, 90) In the rest of my code: from myprecision import tinyint, single, double, csingle n = tinyint(3) x = double(1.e20) z = 1.2 # builtin float gets you the default float kind, properties unknown w = x * float(x) # but in the following case we know w has kind "double". w = x * double(z) u = csingle(x + z * 1.0j) u2 = csingle(x+z, 1.0) Note how that entire code can then be changed to a higher precision by changing the arguments in myprecision.py. Comment: note that you aren't promised that single != double; but you are promised that double(1.e20) will hold a number with 15 decimal digits of precision and a range up to 10**300 or that the float_kind call will fail.
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Paul Dubois writes>:
We have certainly beaten this topic to death in the past. It keeps coming up because there is no good way around it.
Ain't that the truth.
We have many in the astronomical community that use IDL (instead of MATLAB) and for them size is an issue for interactive use. They often manipulate very large arrays interactively. Furthermore, many are astronomers who don't generally see themselves as programmers and who may write programs (perhaps not great programs) don't want to be bothered by such details even in a script (or they may want to read a "professional" program and not have to deal with such things). But you are right in that there is no solution that doesn't have some problems. Every array language deals with this in somewhat different ways I suspect. In IDL, the literals are generally smaller types (ints were (or used to be, I haven't used it myself in a while) 2 bytes, floats single precision) and there were ways of writing literals with higher precision (e.g., 2L, 2.0d-2). Since it was a language specifically intended to deal with numeric processing, supporting many scalar types made sense. Perry
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
I think this is a nice feature, but it's actually heading the opposite direction of where I'd like to see things go for the general use of Numeric. Part of Python's appeal for me is that I don't have to specify types everywhere. I don't want to write explicit casts throughout equations because it munges up their readability. Of course, the casting sometimes can't be helped, but Numeric's current behavior really forces this explicit casting for array types besides double, int, and double complex. I like Numarray's fix for this problem. Also, as Perry noted, its unlikely to be used as an everyday command line tool (like Matlab) if the verbose casting is required. I'm interested to learn what other drawbacks yall found with always returning arrays (0-d for scalars) from Numeric functions. Konrad mentioned the tuple parsing issue in some extension libraries that expects floats, but it sounds like Travis thinks this is no longer an issue. Are there others? eric
rules
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Eric Jones writes>:
Well, sure. It isn't just indexing lists directly, it would be anywhere in Python that you would use a number. In some contexts, the right thing may happen (where the function knows to try to obtain a simple number from an object), but then again, it may not (if calling a function where the number is used directly to index or slice). Here is another case where good arguments can be made for both sides. It really isn't an issue of functionality (one can write methods or functions to do what is needed), it's what the convenient syntax does. For example, if we really want a Python scalar but rank-0 arrays are always returned then something like this may be required:
Whereas if simple indexing returns a Python scalar and consistency is desired in always having arrays returned one may have to do something like this
y = x.indexAsArray(2) # instead of y = x[2]
or perhaps
y = x[ArrayAlwaysAsResultIndexObject(2)] # :-) with better name, of course
One context or the other is going to be inconvenienced, but not prevented from doing what is needed. As long as Python scalars are the 'biggest' type of their kind, we strongly lean towards single elements being converted into Python scalars. It's our feeling that there are more surprises and gotchas, particularly for more casual users, on this side than on the uncertainty of an index returning an array or scalar. People writing code that expects to deal with uncertain dimensionality (the only place that this occurs) should be the ones to go the extra distance in more awkward syntax. Perry
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
lists
Travis seemed to indicate that the Python would convert 0-d arrays to Python types correctly for most (all?) cases. Python indexing is a little unique because it explicitly requires integers. It's not just 0-d arrays that fail as indexes -- Python floats won't work either. As for passing arrays to functions expecting numbers, is it that much different than passing an integer into a function that does floating point operations? Python handles this casting automatically. It seems like is should do the same for 0-d arrays if they know how to "look like" Python types.
Yes, this would be required for using them as array indexes. Or actually:
a[int(x[2])]
Right.
uncertainty
Well, I guess I'd like to figure out exactly what breaks before ruling it out because consistently returning the same type from functions/indexing is beneficial. It becomes even more beneficial with the exception behavior used by SciPy and numarray. The two breakage cases I'm aware of are (1) indexing and (2) functions that explicitly check for arguments of IntType, DoubleType, or ComplextType. When searching the standard library for these guys, they only turn up in copy, pickle, xmlrpclib, and the types module -- all in innocuous ways. Searching for 'float' (which is equal to FloatType) doesn't show up any code that breaks this either. A search of my site-packages had IntType tests used quite a bit -- primarily in SciPy. Some of these would go away with this change, and many were harmless. I saw a few that would need fixing (several in special.py), but the fix was trivial. eric
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Eric Jones wrote>:
That's right, the primary breakage would be downstream use as indices. That appeared to be the case with the find() method of strings for example.
Yes, this would be sufficient for use as indices or slices. I'm not sure if there is any specific code that checks for float but doesn't invoke automatic conversion. I suspect that floats are much less of a problem this way, though will one necessarily know whether to use int(), float(), or scalar()? If one is writing a generic function that could accept int or float arrays then the generation of a int may be overpresuming what the result will be used for. (Though I don't have a particular example to give, I'll think about whether any exist). If the only type that could possibly cause problems is int, then int() should be all that would be necessary, but still awkward. Perry
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
If numarray becomes a first class citizen in the Python world as is hoped, maybe even this issue can be rectified. List/tuple indexing might be able to be changed to accept single element Integer arrays. I suspect this has major implications though -- probably a question for python-dev. eric
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
Ahh, a loaded example ;) I always thought that Numeric's view-slicing is a fairly problematic deviation from standard Python behavior and I'm not entirely sure why it needs to be done that way. Couldn't one have both consistency *and* efficiency by implementing a copy-on-demand scheme (which is what matlab does, if I'm not entirely mistaken; a real copy gets only created if either the original or the 'copy' is modified)? The current behavior seems not just problematic because it breaks consistency and hence user expectations, it also breaks code that is written with more pythonic sequences in mind (in a potentially hard to track down manner) and is, IMHO generally undesirable and error-prone, for pretty much the same reasons that dynamic scope and global variables are generally undesirable and error-prone -- one can unwittingly create intricate interactions between remote parts of a program that can be very difficult to track down. Obviously there *are* cases where one really wants a (partial) view of an existing array. It would seem to me, however, that these cases are exceedingly rare (In all my Numeric code I'm only aware of one instance where I actually want the aliasing behavior, so that I can manipulate a large array by manipulating its views and vice versa). Thus rather than being the default behavior, I'd rather see those cases accommodated by a special syntax that makes it explicit that an alias is desired and that care must be taken when modifying either the original or the view (e.g. one possible syntax would be ``aliased_vector = m.view[:,1]``). Again I think the current behavior is somewhat analogous to having variables declared in global (or dynamic) scope by default which is not only error-prone, it also masks those cases where global (or dynamic) scope *is* actually desired and necessary. It might be that the problems associated with a copy-on-demand scheme outweigh the error-proneness, the interface breakage that the deviation from standard python slicing behavior causes, but otherwise copying on slicing would be an backwards incompatibility in numarray I'd rather like to see (especially since one could easily add a view attribute to Numeric, for forwards-compatibility). I would also suspect that this would make it *a lot* easier to get numarray (or parts of it) into the core, but this is just a guess.
Guido might nowadays think that adding reduce was as mistake, so in that sense it might be a "corner" of the python language (although some people, including me, still rather like using reduce), but I can't see how you can generally replace reduce with anything but a loop. Could you give an example? alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
The current behavior seems not just problematic because it breaks consistency and hence user expectations, it also breaks code
Well, slices creating copies is definitely a bad idea (which is what I have heard proposed before) -- finite difference calculations (and others) would be very slow with this approach. Your copy-on-demand suggestion might work though. Its implementation would be more complex, but I don't think it would require cooperation from the Python core.? It could be handled in the ufunc code. It would also require extension modules to make copies before they modified any values. Copy-on-demand doesn't really fit with python's 'assignments are references" approach to things though does it? Using foo = bar in Python and then changing an element of foo will also change bar. So, I guess there would have to be a distinction made here. This adds a little more complexity. Personally, I like being able to pass views around because it allows for efficient implementations. The option to pass arrays into extension function and edit them in-place is very nice. Copy-on-demand might allow for equal efficiency -- I'm not sure. I haven't found the current behavior very problematic in practice and haven't seen that it as a major stumbling block to new users. I'm happy with status quo on this. But, if copy-on-demand is truly efficient and didn't make extension writing a nightmare, I wouldn't complain about the change either. I have a feeling the implementers of numarray would though. :-) And talk about having to modify legacy code... that that
I think the two things Guido wants for inclusion of numarray is a consensus from our community on what we want, and (more importantly) a comprehensible code base. :-) If Numeric satisfied this 2nd condition, it might already be slated for inclusion... The 1st is never easy with such varied opinions -- I've about concluded that Konrad and I are anti-particles :-) -- but I hope it will happen.
I don't see choosing axis=-1 as a break with Python --
arrays are inherently different and used differently than lists of
in Python. Further, reduce() is a "corner" of the Python language
has been superceded by list comprehensions. Choosing an alternative
Guido might nowadays think that adding reduce was as mistake, so in
multi-dimensional lists that that
Your right. You can't do it without a loop. List comprehensions only supercede filter and map since they always return a list. I think reduce is here to stay. And, like you, I would actually be disappointed to see it go (I like lambda too...) The point is that I wouldn't choose the definition of sum() or product() based on the behavior of Python's reduce operator. Hmmm. So I guess that is key -- its really these *function* interfaces that I disagree with. So, how about add.reduce() keep axis=0 to match the behavior of Python, but sum() and friends defaulted to axis=-1 to match the rest of the library functions? It does break with consistency across the library, so I think it is sub-optimal. However, the distinction is reasonably clear and much less likely to cause confusion. It also allows FFT and future modules (wavelets or whatever) operate across the fastest axis by default while conforming to an intuitive standard. take() and friends would also become axis=-1 for consistency with all other functions. Would this be a reasonable compromise? eric
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
It wouldn't, and I am not sure the implementation would be much more complex, but then I haven't tried. Having both copy on demand and views is difficult, both conceptually and implementationwise, but with copy-on-demand, views become less important.
That would be true as well with copy-on-demand arrays, as foo and bar would be the same object. Semantically, copy-on-demand would be equivalent to copying when slicing, which is exactly Python's behaviour for lists.
So, how about add.reduce() keep axis=0 to match the behavior of Python, but sum() and friends defaulted to axis=-1 to match the rest of the
That sounds like the most arbitrary inconsistency - add.reduce and sum are synonyms for me. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
My suggestion wouldn't conflict with any standard python behavior -- indeed the main motivation would be to have numarray conform to standard python behavior -- ``foo = bar`` and ``foo = bar[20:30]`` would behave exactly as for other sequences in python. The first one creates an alias to bar and in the second one the indexing operation creates a copy of part of the sequence which is then aliased to foo. Sequences are atomic in python, in the sense that indexing them creates a new object, which I think is not in contradiction to python's nice and consistent 'assignments are references' behavior.
I don't know how much of a performance drawback copy-on-demand would have when compared to views one -- I'd suspect it would be not significant, the fact that the runtime behavior becomes a bit more difficult to predict might be more of a drawback (but then I haven't heard matlab users complain and one could always force an eager copy). Another reason why I think a copy-on-demand scheme for slicing operations might be attractive is that I'd suspect one could gain significant benefits from doing other operations in a lazy fashion (plus optionally caching some results), too (transposing seems to cause in principle unnecessary copies at least in some cases at the moment).
I haven't found the current behavior very problematic in practice and haven't seen that it as a major stumbling block to new users. I'm happy
From my experience not even all people who use Numeric quite a lot are *aware*
that the slicing behavior differs from python sequences. You might be right that in practice aliasing doesn't cause too many problems (as long as one sticks to arrays -- it certainly makes it harder to write code that operates on slices of generic sequence types) -- I'd really be interested to know whether there are cases where people have spent a long time to track down a bug caused by the view behavior.
Since the vast majorities of slicing operations are currently not done to create views that are depedently modified, the backward incompatibility might not affect that much code. You are right though, that if Perry and the other numarray implementors don't think that copy-on-demand could be worthwhile the bother then its unlikely to happen.
As I said I can only guess about the politics involved, but I would think that before a significant piece of code such as numarray is incorporated into the core a relevant pep will be discussed in the newsgroup and that many people will feel more confortable about incorporating something into core-python that doesn't deviate significantly from standard behavior (i.e. doesn't view-slice), especially if it mainly caters to a rather specialized audience. But Guido obviously has the last word on those issues and if he doesn't have a problem either way than either way then as long as the community is undivided it shouldn't be an obstacle for inclusion. I agree that division of the community might pose the most significant problems -- MA for example *does* create copies on indexing if I'm not mistaken and the (desirable) transition process from Numeric to numarray also poses not insignificant difficulties and risks, especially since there now are quite a few important projects (not least of them scipy) that are build on top of Numeric and will have to be incorporated in the transition if numarray is to take over. Everything seems in a bit of a limbo right now. I'm currently working on a (fully-featured) matrix class that I'd like to work with both Numeric and numarray (and also scipy where available) more or less transparently for the user, which turns out to be much more difficult than I would have thought. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c3fbc70c6e7101b4905799649b5572e7.jpg?s=120&d=mm&r=g)
Here is what I see as the fundamental problem with implementing slicing in numarray using copy-on-demand instead views. Copy-on-demand requires the maintenance of a global list of all the active views associated with a particular array buffer. Here is a simple example: >>> a = zeros((5000,5000)) >>> b = a[49:51,50] >>> c = a[51:53,50] >>> a[50,50] = 1 The assignment to a[50,50] must trigger a copy of the array b; otherwise b also changes. On the other hand, array c does not need to be copied since its view does not include element 50,50. You could instead copy the array a -- but that means copying a 100 Mbyte array while leaving the original around (since b and c are still using it) -- not a good idea! The bookkeeping can get pretty messy (if you care about memory usage, which we definitely do). Consider this case: >>> a = zeros((5000,5000)) >>> b = a[0:-10,0:-10] >>> c = a[49:51,50] >>> del a >>> b[50,50] = 1 Now what happens? Either we can copy the array for b (which means two copies of the huge (5000,5000) array exist, one used by c and the new version used by b), or we can be clever and copy c instead. Even keeping track of the views associated with a buffer doesn't solve the problem of an array that is passed to a C extension and is modified in place. It would seem that passing an array into a C extension would always require all the associated views to be turned into copies. Otherwise we can't guarantee that views won't be modifed. This kind of state information with side effects leads to a system that is hard to develop, hard to debug, and really messes up the behavior of the program (IMHO). It is *highly* desirable to avoid it if possible. This is not to deny that copy-on-demand (with explicit views available on request) would have some desirable advantages for the behavior of the system. But we've worried these issues to death, and in the end were convinced that slices == views provided the best compromise between the desired behavior and a clean implementation. Rick ------------------------------------------------------------------ Richard L. White rlw@stsci.edu http://sundog.stsci.edu/rick/ Space Telescope Science Institute Baltimore, MD
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Rick White writes> :
Rick beat me to the punch. The requirement for copy-on-demand definitely leads to a far more complex implementation with much more potential for misunderstood memory usage. You could do one small thing and suddenly force a spate of copies (perhaps cascading). There is no way we would taken on a redesign of Numeric with this requirement with the resources we have available.
Rick's explanation doesn't really address the other position which is slices should force immediate copies. This isn't a difficult implementation issue by itself. But it does raise some related implementation questions. Supposing one does feel that views are a feature one wants even though they are not the default, it turns out that it isn't all that simple to obtain views without sacrificing ordinary slicing syntax to obtain a view. It is simple to obtain copies of view slices though. Slicing views may not be important to everyone. It is important to us (and others) and we do see a number of situations where forcing copies to operate on array subsets would be a serious performance problem. We did discuss this issue with Guido and he did not indicate that having different behavior on slicing with arrays would be a show stopper for acceptance into the Standard Library. We are also aware that there is no great consensus on this issue (even internally at STScI :-). Perry Greenfield
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Yes, but I would suspect that cases were a little innocuous a[0] = 3 triggers excessive processing should be rather unusual (matlab or octave users will know).
Numeric with this requirement with the resources we have available.
Fair enough -- if implementing copy-on-demand is too much work then we'll have to live without it (especially if view-slicing doesn't stand in the way of a future inclusion into the python core). I guess the best reason to bite the bullet and carry around state information would be if there were significant other cases where one also would want to optimize operations under the hood. If there isn't much else in this direction then the effort involved might not be justified. One thing that bugs me in Numeric (and that might already have been solved in numarray) is that e.g. ``ravel`` (and I think also ``transpose``) creates unnecessary copies, whereas ``.flat`` doesn't, but won't work in all cases (viz. when the array is non-contiguous), so I can either have ugly or inefficient code.
I'm not sure I understand the above. What is the problem with ``a.view[1:3]`` (or``a.view()[1:3])?
Sure, no one denies that even if with copy-on-demand (explicitly) aliased views would still be useful.
Yep, I just saw Paul Barrett's post :)
Perry Greenfield
alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
I guess that depends on what you mean by unnecessary copies. If the array is non-contiguous what would you have it do?
I didn't mean to imply it wasn't possible, but that it was not quite as clean. The thing I don't like about this approach (or Paul's suggestion of a.sub) is the creation of an odd object that has as its only purpose being sliced. (Even worse, in my opinion, is making it a different kind of array where slicing behaves differently. That will lead to the problem we have discussed for other kinds of array behavior, namely, how do you keep from being confused about a particular array's slicing behavior). That could lead to confusion as well. Many may be under the impression that x = a.view makes x refer to an array when it doesn't. Users would need to know that a.view without a '[' is usually an error. Sure it's not hard to implement. But I don't view it as that clean a solution. On the other hand, a[1:3].copy() (or alternatively, a[1:3].copy) is another array just like any other.
Perry
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
In most cases the array of which I desire a flattened representation is contiguous (plus, I usually don't intend to modify it). Consequently, in most cases I don't want to any copies of it to be created (especially not if it is really large -- which is not seldom the case). The fact that you can never really be sure whether you can actually use ``.flat``, without checking beforehand if the array is in fact contiguous (I don't think there are many guarantees about something being contiguous, or are there?) and that ravel will always work but has a huge overhead, suggests to me that something is not quite right.
If the array is non-contiguous what would you have it do?
Simple -- in that case 'lazy ravel' would do the same as 'ravel' currently does, create a copy (or alternatively rearrange the memory representation to make it non-contiguous and then create a lazy copy, but I don't know whether this would be a good or even feasible idea). A lazy version of ravel would have the same semantics as ravel but only create an actual copy if necessary-- which means as long as no modification takes place and the array is non-contiguous, it will be sufficient to return the ``.flat`` (for starters). If it is contiguous than the copying can't be helped, but these cases are rare and currently you either have to test for them explicitly or slow everything down and waste memory by just always using ``ravel()``. For example, if bar is contiguous ``foo = ravel(bar)`` would be computationally equivalent to ``bar.flat``, as long as neither of them is modified, but semantically equivalent to the current ``foo = ravel(bar)`` in all cases. Thus you could now write:
a = ravel(a)[20:]
wherever you've written this boiler-plate code before:
without any loss of performance.
I personally don't find it messy. And please keep in mind that the ``view`` construct would only very seldomly be used if copy-on-demand is the default -- as I said, I've only needed the aliasing behavior once -- no doubt it was really handy then, but the fact that e.g. matlab doesn't have anything along those lines (AFAIK) suggests that many people will never need it. So even if ``.view`` is messy, I'd rather have something messy that is almost never used, in exchange for (what I perceive as) significantly nicer and cleaner semantics for something that is used all the time (array slicing; alias slicing is messy in at least the respect that it breaks standard usage and generic sequence code as well as causing potentially devious bugs. Unexpected behaviors like phantom buffers kept alive in their entirety by partial views etc. or what ``A = A[::-1]`` does are not exactly pretty either).
I don't see that problem, frankly. The view is *not* an array. It doesn't need (and shouldn't have) anything except a method to access slices (__getitem__). As mentioned before, I also regard it as highly desirable that ``b = a.view[3:10]`` sticks out immediately. This signals "warning -- potentially tricky code ahead". Nothing in ``b = a[3:10]`` tells you that someone intends to modify a and b depedently (because in more than 9 out of 10 cases he won't) -- now *this* is confusing.
Since the ``.view`` shouldn't allow anything except slicing, they'll soon find out ("Error: you can't multiply me, I'm a view and not an array"). And I can't see why that would be harder to figure out (or look up in the docu) than that a[1:3] creates an alias and *not* a copy contrary to *everything* else you've ever heard or read about python sequences (especially since in most cases it will work as intended). Also what exactly is the confused person's notion of the purpose of ``x = a.view`` supposed to be? That ``x = a`` is what ``x = a.copy()`` really does and that to create aliases an alias to ``a`` they would have to use ``x = a.view``? In that case they'd better read the python tutorial before they do any more python programming, because they are in for all kinds of unpleasant surprises (``a = []; b = a; b[1] = 3; print a`` -- oops). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Alexander Schmolck writes>: <Perry Greenfield writes>:
Numarray already returns a view of the array if it is contiguous. Copies are only produced if it is non-contiguous. I assume that is the behavior you are asking for?
Not for numarray, at least in this context.
Currently for numarray .flat will fail if it isn't contiguous. It isn't clear if this should change. If .flat is meant to be a view always, then it should always fail it the array is not contiguous. Ravel is not guaranteed to be a view. This is a problematic issue if we decide to switch from view to copy semantics. If slices produce copies, then does .flat? If so, then how does one produce a flattened view? x.view.flat?
I believe this is already true in numarray.
You're kidding, right? Particularly after arguing for aliasing semantics in the previous paragraph for .flat ;-)
This is basically true, though the confusion may be that a.view is an array object that has different slicing behavior instead of an non-array object that can be sliced to produce a view. I don't view it as a major issue but I do see how may mistakenly infer that. Perry
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
<"Perry Greenfield" writes> [SNIP]
This is one horrible aspect of NumPy that I hope you get rid of. I've been burned by this several times -- I expected a view, but silently got a copy because my array was noncontiguous. If you go with copy semantics, this will go away, if you go with view semantics, this should raise an exception instead of silently copying. Ditto with reshape, etc. In my experience, this is a source of hard to find bugs (as opposed to axes issues which tend to produce shallow bugs). [SNIP]
Ravel should either always return a view or always return a copy -- I don't care which
Wouldn't that just produce a copy of the view? Unless you did some weird special casing on view? The following would work, although it's a little clunky. flat_x = x.view[:] # Or however "get me a view" would be spelled. flat_x.shape = (-1,) -tim
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Not at all -- in fact I was rather shocked when my attention was drawn to the fact that this is also the behavior of Numeric -- I had thought that ravel would *always* create a copy. I absolutely agree with the other posters that remarked that different behavior of ravel (creating a copy vs creating a view, depending on whether the argument is contiguous) is highly undesirable and error-prone (especially since it is not even possible to determine at compile time which behavior will occur, if I'm not mistaken). In fact, I think this behavior is worse than what I incorrectly assumed to be the case. What I was arguing for is a ravel that always has the same semantics, (namely creating a copy) but tha -- because it would create the copy only demand -- would be just as efficient as using .flat when a) its argument were contiguous; and b) neither the result nor the argument were modified while both are alive. The reason that I view `.flat` as a hack, is that it is an operation that is there exclusively for efficiency reasons and has no well defined semantics -- it will only work stochastically, giving better performance in certain cases. Thus you have to cast lots whether you actually use it at runtime (calling .iscontiguous) and always have a fall-back scheme (most likely using ravel) at hand -- there seems to be no way to determine at compile time what's going to happen. I don't think a language or a library should have any such constructs or at least strive to minimize their number. The fact that the current behavior of ravel actually achieves the effect I want in most cases doesn't justify its obscure behavior in my eyes, which translates into a variation of the boiler-plate code previously mentioned (``if a.iscontiguous:...else:``) when you actually want a *single* ravelled copy and it also is a very likely candidate for extremely hard to find bugs. One nice thing about python is that there is very little undefined behavior. I'd like to keep it that way. [snipped]
I didn't argue for any semantics of ``.flat`` -- I just pointed out that I found the division of labour that I (incorrectly) assumed to be the case an ugly hack (for the reasons outlined above): ``ravel``: always works, but always creates copy (which might be undesirable wastage of resources); [this was mistaken; the real semantics are: always works, creates view if contiguous, copy otherwise] ``.flat``: behavior undefined at compile time, a runtime-check can be used to ensure that it can be used as a more efficient alternative to ``ravel`` in some cases. If I now understand the behavior of both ``ravel`` and ``.flat`` correctly then I can't currently see *any* raison d'être for a ``.flat`` attribute. If, as I would hope, the behavior of ravel is changed to always create copies (ideally on-demand), then matters might look different. In that case, it might be justifiable to have ``.flat`` as a specialized construct analogous to what I proposed as``.view``, but only if there is some way to make it work (the same) for both contiguous and non-contiguous arrays. I'm not sure that it would be needed at all (especially with a lazy ravel). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/0b220fa4c0b59e883f360979ee745d63.jpg?s=120&d=mm&r=g)
On 14 Jun 2002, Alexander Schmolck wrote: [...]
Why does ravel have a huge overhead? It seems it already doesn't copy unless required: search for 'Chacking' -- including the mis-spelling -- in this thread: http://groups.google.com/groups?hl=en&lr=&threadm=abjbfp%241t9%241%40news5.svr.pol.co.uk&rnum=1&prev=/groups%3Fq%3Diterating%2Bover%2Bthe%2Bcells%2Bgroup:comp.lang.python%26hl%3Den%26lr%3D%26scoring%3Dr%26selm%3Dabjbfp%25241t9%25241%2540news5.svr.pol.co.uk%26rnum%3D1 or start up your Python interpreter, if you're less lazy than me. John
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Not necessarily. We could decide that array.view is a view of the full array object, and that slicing views returns subviews.
A view could be a different type of object, even though much of the implementation would be shared with arrays. This would help to reduce confusion.
Why? It would be a full-size view, which might actually be useful in many situations. My main objection to changing the slicing behaviour is, like with some other proposed changes, compatibility. Even though view behaviour is not required by every NumPy program, there are people out there who use it and finding the locations in the code that need to be changed is a very tricky business. It may keep programmers from switching to Numarray in spite of benefits elsewhere. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Konrad Hinsen writes>: <Perry Greenfield writes>:
I'd be strongly against this. This has the same problem that other customized array objects have (whether regarding slicing behavior, operators, coercion...). In particular, it is clear which kind it is when you create it, but you may pass it to a module that presumes different array behavior. Having different kind of arrays floating around just seems like an invitation for confusion. I'm very much in favor of picking one or the other behaviors and then making some means of explicitly getting the other behavior.
But one can do that simply by x = a (Though there is the issue that one could do the following which is not the same: x = a.view x.shape = (2,50) so that x is a full array view with a different shape than a) ******** I understand the backward compatibilty issue here, but it is clear that this is an issue that appears to be impossible to get a consensus on. There appear to be significant factions that care passionately about copy vs view and no matter what decision is made many will be unhappy. Perry
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
We already have that situation with lists and arrays (and in much of my code netCDF arrays, which have copy semantics) , but in my experience this has never caused confusion. Most general code working on sequences doesn't modify elements at all. When it does, it either clearly requires view semantics (a function you call in order to modify (parts of) an array) or clearly requires copy semantics (a function that uses an array argument as an initial value that it then modifies).
Then the only solution I see is the current one: default behaviour is view, and when you want a copy yoy copy explicitly. The inverse is not possible, once you made a copy you can't make it behave like a view anymore. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
On Sat, Jun 15, 2002 at 10:53:17AM +0200, Konrad Hinsen wrote:
I don't think it is necessary to create the other object _from_ the default one. You could have copy behavior be the default, and if you want a view of some array you simply request one explicitly with .view, .sub, or whatever. Since creating a view is "cheap" compared to creating a copy, there is nothing sacrificed doing things in this manner. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Let's make this explicit. Given the following four expressions, 1) array 2) array[0] 3) array.view 4) array.view[0] what would the types of each of these objects be according to your proposal? What would the indexing behaviour of those types be? I don't see how you can avoid having either two types or two different behaviours within one type. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
On June 17, 2002 04:57 am, Konrad Hinsen wrote:
If we assume that a slice returns a copy _always_, then I agree that #4 in your list above would not give a user what they would expect: array.view[0] would give the view of a copy of array[0], _not_ a view of array[0] which is probably what is wanted. I _think_ that this could be fixed by making view (or something similar) an option of the slice rather than a method of the object. For example (assuming that a is an array): Expression: Returns: Slicing Behavior: a or a[:] Copy of all of a Returns a copy of the sub-array a[0] Copy of a[0] Returns a copy of the sub-array a[:,view] View of all of a Returns a copy of the sub-array a[0,view] View of a[0] Returns a copy of the sub-array Notice that it is possible to return a copy of a sub-array from a view since you have access (through a pointer) to the original array data. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Konrad Hinsen wrote:
Let's make this explicit. Given the following four expressions,
I thought I had I clear idea of what I wanted here, which was the non-view stuff being the same as Python lists, but I discovered something: Python lists provide slices that are copies, but they are shallow copies, so nested lists, which are sort-of the equivalent of multidimensional arrays, act a lot like the view behavior of NumPy arrays: make a "2-d" list
make an array that is the same:
assign a new binding to the first element:
b = a[0] m = l[0]
change something in it:
The first array is changed Change something in the first element of the list:
The first list is changed too. Now try slices instead:
b = a[2:4]
change an element in the slice:
The first array is changed Now with the list
Change an element
l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]]
The list is changed, but: m[0] = [56,65]
l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]]
The list doesn't change, where:
The array does change My conclusion is that nested lists and Arrays simply are different beasts so we can't expect complete compatibility. I'm also wondering why lists have that weird behavior of a single index returning a reference, and a slice returning a copy. Perhaps it has something to so with the auto-resizing of lists. That being said, I still like the idea of slices producing copies, so:
2) array[0] An Array of rank one less than array, sharing data with array
4) array.view[0] Same as 2)
To add a few: 5) array[0:1] An Array with a copy of the data in array[0] 6) array.view[0:1] An Array sharing data with array As I write this, I am starting to think that this is all a bit strange. Even though lists treat slices and indexes differently, perhaps Arrays should not. They really are different beasts. I also see why it was done the way it was in the first place! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
Chris Barker <Chris.Barker@noaa.gov> writes:
This is not weird at all. Slicing and single item indexing are different conceptually and what I think you have in mind wouldn't really work. Think of a real life container, like box with subcompartments. Obviously you should be able to take out (or put in) an item from the box, which is what single indexing does (and the item may happen to be another box). My understanding is that you'd like the box to return copies of whatever was put into it on indexing, rather than the real thing -- this would not only be counterintuitive and inefficient, it also means that you could exclusively put items with a __copy__-method in lists, which would rather limit their usefulness. Slicing on the other hand creates a whole new box but this box is filled with (references to) the same items (a behavior for which a real life equivalent is more difficult to find :) :
Because the l and l2 are different boxes, however, assigning new items to l1 doesn't change l2 and vice versa. It is true, however that the situation is somewhat different for arrays, because "multidimensional" lists are just nested boxed, whereas multidimensional arrays have a different structure. array[1] indexes some part of itself according to its .shape (which can be modified, thus changing what array[1] indexes, without modifying the actual array contents in memory), whereas list[1] indexes some "real" object. This may mean that the best behavior for ``array[0]`` would be to return a copy and ``array[:]`` etc. what would be a "deep copy" if it where nested lists. I think this is the behavior Paul Dubois MA currently has.
No it is not possible.
I can't see why single-item indexing views would be needed at all if ``array[0]`` doesn't copy as you suggest above.
(I suppose you'd also want array[0:1] and array[0] to have different shape?)
Yes, arrays and lists are indeed different beasts and a different indexing behavior (creating copies) for arrays might well be preferable (since array indexing doesn't refer to "real" objects).
the way it was in the first place!
-Chris
alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Konrad Hinsen wrote:
Please don't!! Having two types of arrays around in a single program that have the same behaviour except when they are sliced is begging for confusion and hard to find bugs. I agree with Perry, that I occasionaly use the view behaviour of slicing, and it is very usefull when I do, but most of the time I would be happier with copy symantics. All I want is a way to get at a view of part of an array, I don't want two different kinds of array around with different slicing behaviour.
My main objection to changing the slicing behaviour is, like with some other proposed changes, compatibility.
The switch from Numeric to Numarray is a substantial change. I think we should view it like the mythical Py3k: an oportunity to make incompatible changes that will really make it better. By the way, as an old MATLAB user, I have to say that being able to get views from a slice is one behaviour of NumPy that I really appreciate, even though I only need it occasionally. MATLAB, howver is a whole different ball of wax in a lot of ways. There has been a lot of discussion about the copy on demand idea in MATLAB, but that is primarily useful because MATLAB has call by value function semantics, so without copy on demand, you would be making copies of large arrays passed to functions that weren't even going to change them. I don't think MATLAB impliments copy on demand for slices anyway, but I could be wrong there. Oh, and no function (ie ravel() ) should return a view in some cases, and a copy in others, that is just asking for bugs! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
I was going to write an almost identical email, but Chris saved me the trouble. These are my feelings as well. Scott On June 14, 2002 07:01 pm, Chris Barker wrote:
-- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
Rick White <rlw@stsci.edu> writes:
Sure, if one wants do perform only the *minimum* amount of copying, things can get rather tricky, but wouldn't it be satisfactory for most cases if attempted modification of the original triggered the delayed copying of the "views" (lazy copies)? In those cases were it isn't satisfactory the user could still explicitly create real (i.e. alias-only) views.
``b`` and ``c`` are copied and then ``a`` is deleted. What does numarray currently keep of a if I do something like the above or:
b = a.flat[::-10000] del a
?
Yes -- but only if the C extension is destructive. In that case the user might well be making a mistake in current Numeric if he has views and doesn't want them to be modified by the operation (of course he might know that the inplace operation does not affect the view(s) -- but wouldn't such cases be rather rare?). If he *does* want the views to be modified, he would obviously have to explictly specify them as such in a copy-on-demand scheme and in the other case he has been most likely been prevented from making an error (and can still explicitly use real views if he knows that the inplace operation on the original will not have undesired effects on the "views").
Sure, copy-on-demand is an optimization and optmizations always mess up things. On the other hand, some optimizations also make "nicer" (e.g. less error-prone) semantics computationally viable, so it's often a question between ease and clarity of the implementation vs. ease and clarity of code that uses it. I'm not denying that too much complexity in the implementation also aversely affects users in the form of bugs and that in the particular case of delayed copying the user can also be affected directly by more difficult to understand ressource usage behavior (e.g. a[0] = 1 triggering a monstrous copying operation). Just out of curiosity, has someone already asked the octave people how much trouble it has caused them to implement copy on demand and whether matlab/octave users in practice do experience difficulties because of the more harder to predict runtime behavior (I think, like matlab, octave does copy-on-demand)?
If the implementing copy-on-demand is too difficult and the resulting code would be too messy then this is certainly a valid reason to compromise on the current slicing behavior (especially since people like me who'd like to see copy-on-demand are unlikely to volunteer to implement it :)
alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
I'm not sure what you mean. Are you saying that if anything in the buffer changes, force all views of the buffer to generate copies (rather than try to determine if the change affected only selected views)? If so, yes, it is easier, but it still is a non-trivial capability to implement.
The whole buffer remains in both cases.
If the point is that views are susceptible to unexpected changes made in place by a C extension, yes, certainly (just as they are for changes made in place in Python). But I'm not sure what that has to do with the implied copy (even if delayed) being broken by extensions written in C. Promising a copy, and not honoring it is not the same as not promising it in the first place. But I may be misunderstanding your point. Perry
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Yes (I suspect that this will be be sufficient in practice).
views)? If so, yes, it is easier, but it still is a non-trivial capability to implement.
Sure. But since copy-on-demand is only an optimization and as such doesn't affect the semantics, it could also be implemented at a later point if the resources are currently not available. I have little doubt that someone will eventually add copy-on-demand, if the option is kept open and in the meantime one could still get all the performance (and alias behavior) of the current implementation by explicitly using ``.view`` (or ``.sub`` if you prefer) to create aliases. I'm becoming increasingly convinced (see below) that copy-slicing-semantics are much to be preferred as the default, so given the above I don't think that performance concerns should sway one towards alias-slicing, if enough people feel that copy semantics as such are preferable.
OK, so this is then a nice example where even eager copy slicing behavior would be *significantly* more efficient than the current aliasing behavior -- so copy-on-demand would then on the whole seem to be not just nearly equally but *more* efficient than alias slicing. And as far as difficult to understand runtime behavior is concerned, the extra ~100MB useless baggage carried around by b (second case) are, I'd venture to suspect, less than obvious to the casual observer. In fact I remember one of my fellow phd-students having significant problems with mysterious memory consumption (a couple of arrays taking up more than 1GB rather than a few hundred MB) -- maybe something like the above was involved. That ``A = A[::-1]`` doesn't work (as pointed out by Paul Barrett) will also come as a surprise to most people. If I understand all this correctly, I consider it a rather strong case against alias slicing as default behavior.
OK, I'll try again, hopefully this is clearer. In a sentence: I don't see any problems with C extensions in particular that would arise from copy-on-demand (I might well be overlooking something, though). Rick was saying that passing an array to a C extension that performs an inplace operation on it means that all copies of all its (lazy) views must be performed. My point was that this is correct, but I can't see any problem with that, neither from the point of extension writer, nor from the point of performance nor from the point of the user, nor indeed from the point of the numarray implementors (obviously the copy-on-demand scheme *as such* will be an effort). All that is needed is a separate interface for (the minority of) C extensions that destructively modify their arguments (they only need to call some function `actualize_views(the_array_or_view)` or whatever at the start -- this function will obviously be necessary regardless of the C extensions). So nothing will break, the promises are kept and no extra work. It won't be any slower than what would happen with current Numeric, either, because either the (Numeric) user intended his (aliased) views to modified as well or it was a bug. If he intended the views to be modified, he would explicitly use alias-views under the new scheme and everything would behave exactly the same. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
That solves one problem and creates another... Two, in fact. One is the inconsistency problem: Python type coercion always promotes "smaller" to "bigger" types, it would be good to make no exceptions from this rule. Besides, there are still situations in which types, ranks, and indexing operations depend on each other in a strange way. With a = array([1., 2.], Float) b = array([3., 4.], Float32) the result of a*b is of type Float, whereas a[0]*b is of type Float32 - if and only if a has rank 1.
(Yes, it would be easiest to deal with if Python had all these types, but I think that will never happen, nor should it happen.)
Python doesn't need to have them as standard types, an add-on package can provide them as well. NumPy seems like the obvious one. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
All this is true. It really comes down to which poison you prefer. Neither choice is perfect. Changing the coercion rules results in the inconsistencies you mention. Not changing them results in the existing inconsistencies recently discussed (and still doesn't remove the difficulties of dealing with scalars in expressions without awkward constructs). We think the inconsistencies you point out are easier to live with than the existing behavior. It would be nice to have a solution that had none of these problems, but that doesn't appear to be possible. Perry
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
It would be nice to have a solution that had none of these problems, but that doesn't appear to be possible.
I still believe that the best solution is to define scalar data types corresponding to all array element types. As far as I can see, this doesn't have any of the disadvantages of the other solutions that have been proposed until now. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Konrad Hinsen writes>:
If x was a Float32 array how would the following not be promoted to a Float64 array y = x + 1. If you are proposing something like y = x + Float32(1.) it would work, but it sure leads to some awkward expressions. Perry
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Yes, that's what I am proposing. It's no worse than what we have now, and if writing Float32 a hundred times is too much effort, an abbreviation like f = Float32 helps a lot. Anyway, following the Python credo "explicit is better than implicit", I'd rather write explicit type conversions than have automagical ones surprise me. Finally, we can always lobby for inclusion of the new scalar types into the core interpreter, with a corresponding syntax for literals, but it would sure help if we could show that the system works and suffers only from the lack of literals. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
I did not receive any major objections, and so I have released a new Numeric (21.3) incorporating bug fixes. I also tagged the CVS tree with VERSION_21_3, and then I incorporated the unsigned integers and unsigned shorts into the CVS version of Numeric, for inclusion in a tentatively named version 22.0 I've only uploaded a platform independent tar file for 21.3. Any binaries need to be updated. If you are interested in testing the new additions, please let me know of any bugs you find. Thanks, -Travis O.
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
How about making indexing (not slicing) arrays *always* return a 0-D array with copy instead of "view" semantics? This is nearly equivalent to creating a new scalar type, but without requiring major changes. I think it is probably even more useful for writing generic code because the returned value with retain array behavior. Also, the following example
would now return a Float array as Konrad desires because a[0] is a Float array. Using copy semantics would fix the unexpected behavior reported by Larry that kicked off this discussion. Slices are a different animal than indexing that would (and definitely should) continue to return view semantics. I further believe that all Numeric functions (sum, product, etc.) should return arrays all the time instead of converting implicitly converting them to Python scalars in special cases such as reductions of 1d arrays. I think the only reason for the silent conversion is that Python lists only allow integer values for use in indexing so that:
Numeric arrays don't have this problem:
I don't think this alone is a strong enough reason for the conversion. Getting rid of special cases is more important because it makes behavior predictable to the novice (and expert), and it is easier to write generic functions and be sure they will not break a year from now when one of the special cases occurs. Are there other reasons why scalars are returned? On coercion rules: As for adding the array to a scalar value, x = array([3., 4.], Float32) y = x + 1. Should y be a Float or a Float32? I like numarray's coercion rules better (Float32). I have run into this upcasting to many times to count. Explicit and implicit aren't obvious to me here. The user explicitly cast x to be Float32, but because of the limited numeric types in Python, the result is upcast to a double. Here's another example,
I had to stare at this for a while when I first saw it before I realized the integer value 3 upcast the result to be type 'i'. So, I think this is confusing and rarely the desired behavior. The fact that this is inconsistent with Python's "always upcast" rule is minor for me. The array math operations are necessarily a different animal from scalar operations because of the extra types supported. Defining these operations in a way that is most convenient for working with array data seems OK. On the other hand, I don't think a jump from 21 to 22 is enough of a jump to make such a change. Numeric progresses pretty fast, and users don't expect such a major shift in behavior. I do think, though, that the computational speed issue is going to result in numarray and Numeric existing side-by-side for a long time. Perhaps we should think create an "interim" Numeric version (maybe starting at 30), that tries to be compatible with the upcoming numarray, in its coercion rules, etc? Advanced features such as indexing arrays with arrays, memory mapped arrays, floating point exception behavior, etc. won't be there, but it should help people transition their codes to work with numarray, and also offer a speedy alternative. A second choice would be to make SciPy's Numeric implementation the intermediate step. It already produces NaN's during div-by-zero exceptions according to numarray's rules. The coercion modifications could also be incorporated.
There was a seriously considered debate last year about unifying Python's numeric model into a single type to get rid of the integer-float distinction, at last year's Python conference and the ensuing months. While it didn't (and won't) happen, I'd be real surprised if the general community would welcome us suggesting stirring yet another type into the brew. Can't we make 0-d arrays work as an alternative? eric
Konrad. --
------------------------------------------------------------------------ --
------------------------------------------------------------------------ --
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
I think this was discussed as well a long time ago. For pure Python code, this would be a very good solution. But
I think the only reason for the silent conversion is that Python lists only allow integer values for use in indexing so that:
There are some more cases where the type matters. If you call C routines that do argument parsing via PyArg_ParseTuple and expect a float argument, a rank-0 float array will raise a TypeError. All the functions from the math module work like that, and of course many in various extension modules. In the ideal world, there would not be any distinction between scalars and rank-0 arrays. But I don't think we'll get there soon.
Statistically they probably give the desired result in more cases. But they are in contradiction to Python principles, and consistency counts a lot on my value scale. I propose an experiment: ask a few Python programmers who are not using NumPy what type they would expect for the result. I bet that not a single one would answer "Float32".
On the other hand, I don't think a jump from 21 to 22 is enough of a jump to make such a change. Numeric progresses pretty fast, and users
I don't think any increase in version number is enough for incompatible changes. For many users, NumPy is just a building block, they install it because some other package(s) require it. If a new version breaks those other packages, they won't be happy. The authors of those packages won't be happy either, as they will get the angry letters. As an author of such packages, I am speaking from experience. I have even considered to make my own NumPy distribution under a different name, just to be safe from changes in NumPy that break my code (in the past it was mostly the installation code that was broken when arrayobject.h changed its location). In my opinion, anything that is not compatible with Numeric should not be called Numeric. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/49df8cd4b1b6056c727778925f86147a.jpg?s=120&d=mm&r=g)
On Mon, 2002-06-10 at 11:08, Konrad Hinsen wrote:
Actually, the code in PyArg_ParseTuple asks the object it gets if it knows how to be a float. 0-d arrays for some time have known how to be Python floats. So, I do not think this error occurs as you've described. Could you demonstrate this error? In fact most of the code in Python itself which needs scalars allows arbitrary objects provided the object has defined functions which return a Python scalar. The only exception to this that I've seen is the list indexing code (probably for optimization purposes). There could be more places, but I have not found them or heard of them. Originally Numeric arrays did not define appropriate functions for 0-d arrays to act like scalars in the right places. For quite a while, they have now. I'm quite supportive of never returning Python scalars from Numeric array operations unless specifically requested (e.g. the toscalar method).
I'm not sure I agree with that at all. On what reasoning is that presumption based? If I encounter a Python object that I'm unfamiliar with, I don't presume to know how it will define multiplication.
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Travis Oliphant <oliphant.travis@ieee.org> writes:
No, it seems gone indeed. I remember a lengthy battle due to this problem, but that was a long time ago.
Even for indexing, I don't see the point. If you test for the int type and do conversion attempts only for non-ints, that shouldn't slow down normal usage at all.
I suppose this would be easy to implement, right? Then why not do it in a test release and find out empirically how much code it breaks.
presumption based? If I encounter a Python object that I'm unfamiliar with, I don't presume to know how it will define multiplication.
But if that object pretends to be a number type, a sequence type, a mapping type, etc., I do make assumptions about its behaviour. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/56475d2e8acb48b4308f609982f94440.jpg?s=120&d=mm&r=g)
We have certainly beaten this topic to death in the past. It keeps coming up because there is no good way around it. Two points about the x + 1.0 issue: 1. How often this occurs is really a function of what you are doing. For those using Numeric Python as a kind of MATLAB clone, who are typing interactively, the size issue is of less importance and the easy expression is of more importance. To those writing scripts to batch process or writing steered applications, the size issue is more important and the easy expression less important. I'm using words like less and more here because both issues matter to everyone at some time, it is just a question of relative frequency of concern. 2. Part of what I had in mind with the kinds module proposal PEP 0242 was dealing with the literal issue. There had been some proposals to make literals decimal numbers or rationals, and that got me thinking about how to defend myself if they did it, and also about the fact that Python doesn't have Fortran's kind concept which you can use to gain a more platform-independent calculation.
From the PEP this example
In module myprecision.py: import kinds tinyint = kinds.int_kind(1) single = kinds.float_kind(6, 90) double = kinds.float_kind(15, 300) csingle = kinds.complex_kind(6, 90) In the rest of my code: from myprecision import tinyint, single, double, csingle n = tinyint(3) x = double(1.e20) z = 1.2 # builtin float gets you the default float kind, properties unknown w = x * float(x) # but in the following case we know w has kind "double". w = x * double(z) u = csingle(x + z * 1.0j) u2 = csingle(x+z, 1.0) Note how that entire code can then be changed to a higher precision by changing the arguments in myprecision.py. Comment: note that you aren't promised that single != double; but you are promised that double(1.e20) will hold a number with 15 decimal digits of precision and a range up to 10**300 or that the float_kind call will fail.
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Paul Dubois writes>:
We have certainly beaten this topic to death in the past. It keeps coming up because there is no good way around it.
Ain't that the truth.
We have many in the astronomical community that use IDL (instead of MATLAB) and for them size is an issue for interactive use. They often manipulate very large arrays interactively. Furthermore, many are astronomers who don't generally see themselves as programmers and who may write programs (perhaps not great programs) don't want to be bothered by such details even in a script (or they may want to read a "professional" program and not have to deal with such things). But you are right in that there is no solution that doesn't have some problems. Every array language deals with this in somewhat different ways I suspect. In IDL, the literals are generally smaller types (ints were (or used to be, I haven't used it myself in a while) 2 bytes, floats single precision) and there were ways of writing literals with higher precision (e.g., 2L, 2.0d-2). Since it was a language specifically intended to deal with numeric processing, supporting many scalar types made sense. Perry
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
I think this is a nice feature, but it's actually heading the opposite direction of where I'd like to see things go for the general use of Numeric. Part of Python's appeal for me is that I don't have to specify types everywhere. I don't want to write explicit casts throughout equations because it munges up their readability. Of course, the casting sometimes can't be helped, but Numeric's current behavior really forces this explicit casting for array types besides double, int, and double complex. I like Numarray's fix for this problem. Also, as Perry noted, its unlikely to be used as an everyday command line tool (like Matlab) if the verbose casting is required. I'm interested to learn what other drawbacks yall found with always returning arrays (0-d for scalars) from Numeric functions. Konrad mentioned the tuple parsing issue in some extension libraries that expects floats, but it sounds like Travis thinks this is no longer an issue. Are there others? eric
rules
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Eric Jones writes>:
Well, sure. It isn't just indexing lists directly, it would be anywhere in Python that you would use a number. In some contexts, the right thing may happen (where the function knows to try to obtain a simple number from an object), but then again, it may not (if calling a function where the number is used directly to index or slice). Here is another case where good arguments can be made for both sides. It really isn't an issue of functionality (one can write methods or functions to do what is needed), it's what the convenient syntax does. For example, if we really want a Python scalar but rank-0 arrays are always returned then something like this may be required:
Whereas if simple indexing returns a Python scalar and consistency is desired in always having arrays returned one may have to do something like this
y = x.indexAsArray(2) # instead of y = x[2]
or perhaps
y = x[ArrayAlwaysAsResultIndexObject(2)] # :-) with better name, of course
One context or the other is going to be inconvenienced, but not prevented from doing what is needed. As long as Python scalars are the 'biggest' type of their kind, we strongly lean towards single elements being converted into Python scalars. It's our feeling that there are more surprises and gotchas, particularly for more casual users, on this side than on the uncertainty of an index returning an array or scalar. People writing code that expects to deal with uncertain dimensionality (the only place that this occurs) should be the ones to go the extra distance in more awkward syntax. Perry
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
lists
Travis seemed to indicate that the Python would convert 0-d arrays to Python types correctly for most (all?) cases. Python indexing is a little unique because it explicitly requires integers. It's not just 0-d arrays that fail as indexes -- Python floats won't work either. As for passing arrays to functions expecting numbers, is it that much different than passing an integer into a function that does floating point operations? Python handles this casting automatically. It seems like is should do the same for 0-d arrays if they know how to "look like" Python types.
Yes, this would be required for using them as array indexes. Or actually:
a[int(x[2])]
Right.
uncertainty
Well, I guess I'd like to figure out exactly what breaks before ruling it out because consistently returning the same type from functions/indexing is beneficial. It becomes even more beneficial with the exception behavior used by SciPy and numarray. The two breakage cases I'm aware of are (1) indexing and (2) functions that explicitly check for arguments of IntType, DoubleType, or ComplextType. When searching the standard library for these guys, they only turn up in copy, pickle, xmlrpclib, and the types module -- all in innocuous ways. Searching for 'float' (which is equal to FloatType) doesn't show up any code that breaks this either. A search of my site-packages had IntType tests used quite a bit -- primarily in SciPy. Some of these would go away with this change, and many were harmless. I saw a few that would need fixing (several in special.py), but the fix was trivial. eric
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Eric Jones wrote>:
That's right, the primary breakage would be downstream use as indices. That appeared to be the case with the find() method of strings for example.
Yes, this would be sufficient for use as indices or slices. I'm not sure if there is any specific code that checks for float but doesn't invoke automatic conversion. I suspect that floats are much less of a problem this way, though will one necessarily know whether to use int(), float(), or scalar()? If one is writing a generic function that could accept int or float arrays then the generation of a int may be overpresuming what the result will be used for. (Though I don't have a particular example to give, I'll think about whether any exist). If the only type that could possibly cause problems is int, then int() should be all that would be necessary, but still awkward. Perry
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
If numarray becomes a first class citizen in the Python world as is hoped, maybe even this issue can be rectified. List/tuple indexing might be able to be changed to accept single element Integer arrays. I suspect this has major implications though -- probably a question for python-dev. eric
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
Ahh, a loaded example ;) I always thought that Numeric's view-slicing is a fairly problematic deviation from standard Python behavior and I'm not entirely sure why it needs to be done that way. Couldn't one have both consistency *and* efficiency by implementing a copy-on-demand scheme (which is what matlab does, if I'm not entirely mistaken; a real copy gets only created if either the original or the 'copy' is modified)? The current behavior seems not just problematic because it breaks consistency and hence user expectations, it also breaks code that is written with more pythonic sequences in mind (in a potentially hard to track down manner) and is, IMHO generally undesirable and error-prone, for pretty much the same reasons that dynamic scope and global variables are generally undesirable and error-prone -- one can unwittingly create intricate interactions between remote parts of a program that can be very difficult to track down. Obviously there *are* cases where one really wants a (partial) view of an existing array. It would seem to me, however, that these cases are exceedingly rare (In all my Numeric code I'm only aware of one instance where I actually want the aliasing behavior, so that I can manipulate a large array by manipulating its views and vice versa). Thus rather than being the default behavior, I'd rather see those cases accommodated by a special syntax that makes it explicit that an alias is desired and that care must be taken when modifying either the original or the view (e.g. one possible syntax would be ``aliased_vector = m.view[:,1]``). Again I think the current behavior is somewhat analogous to having variables declared in global (or dynamic) scope by default which is not only error-prone, it also masks those cases where global (or dynamic) scope *is* actually desired and necessary. It might be that the problems associated with a copy-on-demand scheme outweigh the error-proneness, the interface breakage that the deviation from standard python slicing behavior causes, but otherwise copying on slicing would be an backwards incompatibility in numarray I'd rather like to see (especially since one could easily add a view attribute to Numeric, for forwards-compatibility). I would also suspect that this would make it *a lot* easier to get numarray (or parts of it) into the core, but this is just a guess.
Guido might nowadays think that adding reduce was as mistake, so in that sense it might be a "corner" of the python language (although some people, including me, still rather like using reduce), but I can't see how you can generally replace reduce with anything but a loop. Could you give an example? alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/1bc8694bf55c688b2aa2075eedf9b4c6.jpg?s=120&d=mm&r=g)
The current behavior seems not just problematic because it breaks consistency and hence user expectations, it also breaks code
Well, slices creating copies is definitely a bad idea (which is what I have heard proposed before) -- finite difference calculations (and others) would be very slow with this approach. Your copy-on-demand suggestion might work though. Its implementation would be more complex, but I don't think it would require cooperation from the Python core.? It could be handled in the ufunc code. It would also require extension modules to make copies before they modified any values. Copy-on-demand doesn't really fit with python's 'assignments are references" approach to things though does it? Using foo = bar in Python and then changing an element of foo will also change bar. So, I guess there would have to be a distinction made here. This adds a little more complexity. Personally, I like being able to pass views around because it allows for efficient implementations. The option to pass arrays into extension function and edit them in-place is very nice. Copy-on-demand might allow for equal efficiency -- I'm not sure. I haven't found the current behavior very problematic in practice and haven't seen that it as a major stumbling block to new users. I'm happy with status quo on this. But, if copy-on-demand is truly efficient and didn't make extension writing a nightmare, I wouldn't complain about the change either. I have a feeling the implementers of numarray would though. :-) And talk about having to modify legacy code... that that
I think the two things Guido wants for inclusion of numarray is a consensus from our community on what we want, and (more importantly) a comprehensible code base. :-) If Numeric satisfied this 2nd condition, it might already be slated for inclusion... The 1st is never easy with such varied opinions -- I've about concluded that Konrad and I are anti-particles :-) -- but I hope it will happen.
I don't see choosing axis=-1 as a break with Python --
arrays are inherently different and used differently than lists of
in Python. Further, reduce() is a "corner" of the Python language
has been superceded by list comprehensions. Choosing an alternative
Guido might nowadays think that adding reduce was as mistake, so in
multi-dimensional lists that that
Your right. You can't do it without a loop. List comprehensions only supercede filter and map since they always return a list. I think reduce is here to stay. And, like you, I would actually be disappointed to see it go (I like lambda too...) The point is that I wouldn't choose the definition of sum() or product() based on the behavior of Python's reduce operator. Hmmm. So I guess that is key -- its really these *function* interfaces that I disagree with. So, how about add.reduce() keep axis=0 to match the behavior of Python, but sum() and friends defaulted to axis=-1 to match the rest of the library functions? It does break with consistency across the library, so I think it is sub-optimal. However, the distinction is reasonably clear and much less likely to cause confusion. It also allows FFT and future modules (wavelets or whatever) operate across the fastest axis by default while conforming to an intuitive standard. take() and friends would also become axis=-1 for consistency with all other functions. Would this be a reasonable compromise? eric
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
It wouldn't, and I am not sure the implementation would be much more complex, but then I haven't tried. Having both copy on demand and views is difficult, both conceptually and implementationwise, but with copy-on-demand, views become less important.
That would be true as well with copy-on-demand arrays, as foo and bar would be the same object. Semantically, copy-on-demand would be equivalent to copying when slicing, which is exactly Python's behaviour for lists.
So, how about add.reduce() keep axis=0 to match the behavior of Python, but sum() and friends defaulted to axis=-1 to match the rest of the
That sounds like the most arbitrary inconsistency - add.reduce and sum are synonyms for me. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"eric jones" <eric@enthought.com> writes:
My suggestion wouldn't conflict with any standard python behavior -- indeed the main motivation would be to have numarray conform to standard python behavior -- ``foo = bar`` and ``foo = bar[20:30]`` would behave exactly as for other sequences in python. The first one creates an alias to bar and in the second one the indexing operation creates a copy of part of the sequence which is then aliased to foo. Sequences are atomic in python, in the sense that indexing them creates a new object, which I think is not in contradiction to python's nice and consistent 'assignments are references' behavior.
I don't know how much of a performance drawback copy-on-demand would have when compared to views one -- I'd suspect it would be not significant, the fact that the runtime behavior becomes a bit more difficult to predict might be more of a drawback (but then I haven't heard matlab users complain and one could always force an eager copy). Another reason why I think a copy-on-demand scheme for slicing operations might be attractive is that I'd suspect one could gain significant benefits from doing other operations in a lazy fashion (plus optionally caching some results), too (transposing seems to cause in principle unnecessary copies at least in some cases at the moment).
I haven't found the current behavior very problematic in practice and haven't seen that it as a major stumbling block to new users. I'm happy
From my experience not even all people who use Numeric quite a lot are *aware*
that the slicing behavior differs from python sequences. You might be right that in practice aliasing doesn't cause too many problems (as long as one sticks to arrays -- it certainly makes it harder to write code that operates on slices of generic sequence types) -- I'd really be interested to know whether there are cases where people have spent a long time to track down a bug caused by the view behavior.
Since the vast majorities of slicing operations are currently not done to create views that are depedently modified, the backward incompatibility might not affect that much code. You are right though, that if Perry and the other numarray implementors don't think that copy-on-demand could be worthwhile the bother then its unlikely to happen.
As I said I can only guess about the politics involved, but I would think that before a significant piece of code such as numarray is incorporated into the core a relevant pep will be discussed in the newsgroup and that many people will feel more confortable about incorporating something into core-python that doesn't deviate significantly from standard behavior (i.e. doesn't view-slice), especially if it mainly caters to a rather specialized audience. But Guido obviously has the last word on those issues and if he doesn't have a problem either way than either way then as long as the community is undivided it shouldn't be an obstacle for inclusion. I agree that division of the community might pose the most significant problems -- MA for example *does* create copies on indexing if I'm not mistaken and the (desirable) transition process from Numeric to numarray also poses not insignificant difficulties and risks, especially since there now are quite a few important projects (not least of them scipy) that are build on top of Numeric and will have to be incorporated in the transition if numarray is to take over. Everything seems in a bit of a limbo right now. I'm currently working on a (fully-featured) matrix class that I'd like to work with both Numeric and numarray (and also scipy where available) more or less transparently for the user, which turns out to be much more difficult than I would have thought. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c3fbc70c6e7101b4905799649b5572e7.jpg?s=120&d=mm&r=g)
Here is what I see as the fundamental problem with implementing slicing in numarray using copy-on-demand instead views. Copy-on-demand requires the maintenance of a global list of all the active views associated with a particular array buffer. Here is a simple example: >>> a = zeros((5000,5000)) >>> b = a[49:51,50] >>> c = a[51:53,50] >>> a[50,50] = 1 The assignment to a[50,50] must trigger a copy of the array b; otherwise b also changes. On the other hand, array c does not need to be copied since its view does not include element 50,50. You could instead copy the array a -- but that means copying a 100 Mbyte array while leaving the original around (since b and c are still using it) -- not a good idea! The bookkeeping can get pretty messy (if you care about memory usage, which we definitely do). Consider this case: >>> a = zeros((5000,5000)) >>> b = a[0:-10,0:-10] >>> c = a[49:51,50] >>> del a >>> b[50,50] = 1 Now what happens? Either we can copy the array for b (which means two copies of the huge (5000,5000) array exist, one used by c and the new version used by b), or we can be clever and copy c instead. Even keeping track of the views associated with a buffer doesn't solve the problem of an array that is passed to a C extension and is modified in place. It would seem that passing an array into a C extension would always require all the associated views to be turned into copies. Otherwise we can't guarantee that views won't be modifed. This kind of state information with side effects leads to a system that is hard to develop, hard to debug, and really messes up the behavior of the program (IMHO). It is *highly* desirable to avoid it if possible. This is not to deny that copy-on-demand (with explicit views available on request) would have some desirable advantages for the behavior of the system. But we've worried these issues to death, and in the end were convinced that slices == views provided the best compromise between the desired behavior and a clean implementation. Rick ------------------------------------------------------------------ Richard L. White rlw@stsci.edu http://sundog.stsci.edu/rick/ Space Telescope Science Institute Baltimore, MD
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Rick White writes> :
Rick beat me to the punch. The requirement for copy-on-demand definitely leads to a far more complex implementation with much more potential for misunderstood memory usage. You could do one small thing and suddenly force a spate of copies (perhaps cascading). There is no way we would taken on a redesign of Numeric with this requirement with the resources we have available.
Rick's explanation doesn't really address the other position which is slices should force immediate copies. This isn't a difficult implementation issue by itself. But it does raise some related implementation questions. Supposing one does feel that views are a feature one wants even though they are not the default, it turns out that it isn't all that simple to obtain views without sacrificing ordinary slicing syntax to obtain a view. It is simple to obtain copies of view slices though. Slicing views may not be important to everyone. It is important to us (and others) and we do see a number of situations where forcing copies to operate on array subsets would be a serious performance problem. We did discuss this issue with Guido and he did not indicate that having different behavior on slicing with arrays would be a show stopper for acceptance into the Standard Library. We are also aware that there is no great consensus on this issue (even internally at STScI :-). Perry Greenfield
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Yes, but I would suspect that cases were a little innocuous a[0] = 3 triggers excessive processing should be rather unusual (matlab or octave users will know).
Numeric with this requirement with the resources we have available.
Fair enough -- if implementing copy-on-demand is too much work then we'll have to live without it (especially if view-slicing doesn't stand in the way of a future inclusion into the python core). I guess the best reason to bite the bullet and carry around state information would be if there were significant other cases where one also would want to optimize operations under the hood. If there isn't much else in this direction then the effort involved might not be justified. One thing that bugs me in Numeric (and that might already have been solved in numarray) is that e.g. ``ravel`` (and I think also ``transpose``) creates unnecessary copies, whereas ``.flat`` doesn't, but won't work in all cases (viz. when the array is non-contiguous), so I can either have ugly or inefficient code.
I'm not sure I understand the above. What is the problem with ``a.view[1:3]`` (or``a.view()[1:3])?
Sure, no one denies that even if with copy-on-demand (explicitly) aliased views would still be useful.
Yep, I just saw Paul Barrett's post :)
Perry Greenfield
alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
I guess that depends on what you mean by unnecessary copies. If the array is non-contiguous what would you have it do?
I didn't mean to imply it wasn't possible, but that it was not quite as clean. The thing I don't like about this approach (or Paul's suggestion of a.sub) is the creation of an odd object that has as its only purpose being sliced. (Even worse, in my opinion, is making it a different kind of array where slicing behaves differently. That will lead to the problem we have discussed for other kinds of array behavior, namely, how do you keep from being confused about a particular array's slicing behavior). That could lead to confusion as well. Many may be under the impression that x = a.view makes x refer to an array when it doesn't. Users would need to know that a.view without a '[' is usually an error. Sure it's not hard to implement. But I don't view it as that clean a solution. On the other hand, a[1:3].copy() (or alternatively, a[1:3].copy) is another array just like any other.
Perry
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
In most cases the array of which I desire a flattened representation is contiguous (plus, I usually don't intend to modify it). Consequently, in most cases I don't want to any copies of it to be created (especially not if it is really large -- which is not seldom the case). The fact that you can never really be sure whether you can actually use ``.flat``, without checking beforehand if the array is in fact contiguous (I don't think there are many guarantees about something being contiguous, or are there?) and that ravel will always work but has a huge overhead, suggests to me that something is not quite right.
If the array is non-contiguous what would you have it do?
Simple -- in that case 'lazy ravel' would do the same as 'ravel' currently does, create a copy (or alternatively rearrange the memory representation to make it non-contiguous and then create a lazy copy, but I don't know whether this would be a good or even feasible idea). A lazy version of ravel would have the same semantics as ravel but only create an actual copy if necessary-- which means as long as no modification takes place and the array is non-contiguous, it will be sufficient to return the ``.flat`` (for starters). If it is contiguous than the copying can't be helped, but these cases are rare and currently you either have to test for them explicitly or slow everything down and waste memory by just always using ``ravel()``. For example, if bar is contiguous ``foo = ravel(bar)`` would be computationally equivalent to ``bar.flat``, as long as neither of them is modified, but semantically equivalent to the current ``foo = ravel(bar)`` in all cases. Thus you could now write:
a = ravel(a)[20:]
wherever you've written this boiler-plate code before:
without any loss of performance.
I personally don't find it messy. And please keep in mind that the ``view`` construct would only very seldomly be used if copy-on-demand is the default -- as I said, I've only needed the aliasing behavior once -- no doubt it was really handy then, but the fact that e.g. matlab doesn't have anything along those lines (AFAIK) suggests that many people will never need it. So even if ``.view`` is messy, I'd rather have something messy that is almost never used, in exchange for (what I perceive as) significantly nicer and cleaner semantics for something that is used all the time (array slicing; alias slicing is messy in at least the respect that it breaks standard usage and generic sequence code as well as causing potentially devious bugs. Unexpected behaviors like phantom buffers kept alive in their entirety by partial views etc. or what ``A = A[::-1]`` does are not exactly pretty either).
I don't see that problem, frankly. The view is *not* an array. It doesn't need (and shouldn't have) anything except a method to access slices (__getitem__). As mentioned before, I also regard it as highly desirable that ``b = a.view[3:10]`` sticks out immediately. This signals "warning -- potentially tricky code ahead". Nothing in ``b = a[3:10]`` tells you that someone intends to modify a and b depedently (because in more than 9 out of 10 cases he won't) -- now *this* is confusing.
Since the ``.view`` shouldn't allow anything except slicing, they'll soon find out ("Error: you can't multiply me, I'm a view and not an array"). And I can't see why that would be harder to figure out (or look up in the docu) than that a[1:3] creates an alias and *not* a copy contrary to *everything* else you've ever heard or read about python sequences (especially since in most cases it will work as intended). Also what exactly is the confused person's notion of the purpose of ``x = a.view`` supposed to be? That ``x = a`` is what ``x = a.copy()`` really does and that to create aliases an alias to ``a`` they would have to use ``x = a.view``? In that case they'd better read the python tutorial before they do any more python programming, because they are in for all kinds of unpleasant surprises (``a = []; b = a; b[1] = 3; print a`` -- oops). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Alexander Schmolck writes>: <Perry Greenfield writes>:
Numarray already returns a view of the array if it is contiguous. Copies are only produced if it is non-contiguous. I assume that is the behavior you are asking for?
Not for numarray, at least in this context.
Currently for numarray .flat will fail if it isn't contiguous. It isn't clear if this should change. If .flat is meant to be a view always, then it should always fail it the array is not contiguous. Ravel is not guaranteed to be a view. This is a problematic issue if we decide to switch from view to copy semantics. If slices produce copies, then does .flat? If so, then how does one produce a flattened view? x.view.flat?
I believe this is already true in numarray.
You're kidding, right? Particularly after arguing for aliasing semantics in the previous paragraph for .flat ;-)
This is basically true, though the confusion may be that a.view is an array object that has different slicing behavior instead of an non-array object that can be sliced to produce a view. I don't view it as a major issue but I do see how may mistakenly infer that. Perry
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
<"Perry Greenfield" writes> [SNIP]
This is one horrible aspect of NumPy that I hope you get rid of. I've been burned by this several times -- I expected a view, but silently got a copy because my array was noncontiguous. If you go with copy semantics, this will go away, if you go with view semantics, this should raise an exception instead of silently copying. Ditto with reshape, etc. In my experience, this is a source of hard to find bugs (as opposed to axes issues which tend to produce shallow bugs). [SNIP]
Ravel should either always return a view or always return a copy -- I don't care which
Wouldn't that just produce a copy of the view? Unless you did some weird special casing on view? The following would work, although it's a little clunky. flat_x = x.view[:] # Or however "get me a view" would be spelled. flat_x.shape = (-1,) -tim
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Not at all -- in fact I was rather shocked when my attention was drawn to the fact that this is also the behavior of Numeric -- I had thought that ravel would *always* create a copy. I absolutely agree with the other posters that remarked that different behavior of ravel (creating a copy vs creating a view, depending on whether the argument is contiguous) is highly undesirable and error-prone (especially since it is not even possible to determine at compile time which behavior will occur, if I'm not mistaken). In fact, I think this behavior is worse than what I incorrectly assumed to be the case. What I was arguing for is a ravel that always has the same semantics, (namely creating a copy) but tha -- because it would create the copy only demand -- would be just as efficient as using .flat when a) its argument were contiguous; and b) neither the result nor the argument were modified while both are alive. The reason that I view `.flat` as a hack, is that it is an operation that is there exclusively for efficiency reasons and has no well defined semantics -- it will only work stochastically, giving better performance in certain cases. Thus you have to cast lots whether you actually use it at runtime (calling .iscontiguous) and always have a fall-back scheme (most likely using ravel) at hand -- there seems to be no way to determine at compile time what's going to happen. I don't think a language or a library should have any such constructs or at least strive to minimize their number. The fact that the current behavior of ravel actually achieves the effect I want in most cases doesn't justify its obscure behavior in my eyes, which translates into a variation of the boiler-plate code previously mentioned (``if a.iscontiguous:...else:``) when you actually want a *single* ravelled copy and it also is a very likely candidate for extremely hard to find bugs. One nice thing about python is that there is very little undefined behavior. I'd like to keep it that way. [snipped]
I didn't argue for any semantics of ``.flat`` -- I just pointed out that I found the division of labour that I (incorrectly) assumed to be the case an ugly hack (for the reasons outlined above): ``ravel``: always works, but always creates copy (which might be undesirable wastage of resources); [this was mistaken; the real semantics are: always works, creates view if contiguous, copy otherwise] ``.flat``: behavior undefined at compile time, a runtime-check can be used to ensure that it can be used as a more efficient alternative to ``ravel`` in some cases. If I now understand the behavior of both ``ravel`` and ``.flat`` correctly then I can't currently see *any* raison d'être for a ``.flat`` attribute. If, as I would hope, the behavior of ravel is changed to always create copies (ideally on-demand), then matters might look different. In that case, it might be justifiable to have ``.flat`` as a specialized construct analogous to what I proposed as``.view``, but only if there is some way to make it work (the same) for both contiguous and non-contiguous arrays. I'm not sure that it would be needed at all (especially with a lazy ravel). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/0b220fa4c0b59e883f360979ee745d63.jpg?s=120&d=mm&r=g)
On 14 Jun 2002, Alexander Schmolck wrote: [...]
Why does ravel have a huge overhead? It seems it already doesn't copy unless required: search for 'Chacking' -- including the mis-spelling -- in this thread: http://groups.google.com/groups?hl=en&lr=&threadm=abjbfp%241t9%241%40news5.svr.pol.co.uk&rnum=1&prev=/groups%3Fq%3Diterating%2Bover%2Bthe%2Bcells%2Bgroup:comp.lang.python%26hl%3Den%26lr%3D%26scoring%3Dr%26selm%3Dabjbfp%25241t9%25241%2540news5.svr.pol.co.uk%26rnum%3D1 or start up your Python interpreter, if you're less lazy than me. John
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Not necessarily. We could decide that array.view is a view of the full array object, and that slicing views returns subviews.
A view could be a different type of object, even though much of the implementation would be shared with arrays. This would help to reduce confusion.
Why? It would be a full-size view, which might actually be useful in many situations. My main objection to changing the slicing behaviour is, like with some other proposed changes, compatibility. Even though view behaviour is not required by every NumPy program, there are people out there who use it and finding the locations in the code that need to be changed is a very tricky business. It may keep programmers from switching to Numarray in spite of benefits elsewhere. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
<Konrad Hinsen writes>: <Perry Greenfield writes>:
I'd be strongly against this. This has the same problem that other customized array objects have (whether regarding slicing behavior, operators, coercion...). In particular, it is clear which kind it is when you create it, but you may pass it to a module that presumes different array behavior. Having different kind of arrays floating around just seems like an invitation for confusion. I'm very much in favor of picking one or the other behaviors and then making some means of explicitly getting the other behavior.
But one can do that simply by x = a (Though there is the issue that one could do the following which is not the same: x = a.view x.shape = (2,50) so that x is a full array view with a different shape than a) ******** I understand the backward compatibilty issue here, but it is clear that this is an issue that appears to be impossible to get a consensus on. There appear to be significant factions that care passionately about copy vs view and no matter what decision is made many will be unhappy. Perry
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
We already have that situation with lists and arrays (and in much of my code netCDF arrays, which have copy semantics) , but in my experience this has never caused confusion. Most general code working on sequences doesn't modify elements at all. When it does, it either clearly requires view semantics (a function you call in order to modify (parts of) an array) or clearly requires copy semantics (a function that uses an array argument as an initial value that it then modifies).
Then the only solution I see is the current one: default behaviour is view, and when you want a copy yoy copy explicitly. The inverse is not possible, once you made a copy you can't make it behave like a view anymore. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
On Sat, Jun 15, 2002 at 10:53:17AM +0200, Konrad Hinsen wrote:
I don't think it is necessary to create the other object _from_ the default one. You could have copy behavior be the default, and if you want a view of some array you simply request one explicitly with .view, .sub, or whatever. Since creating a view is "cheap" compared to creating a copy, there is nothing sacrificed doing things in this manner. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Let's make this explicit. Given the following four expressions, 1) array 2) array[0] 3) array.view 4) array.view[0] what would the types of each of these objects be according to your proposal? What would the indexing behaviour of those types be? I don't see how you can avoid having either two types or two different behaviours within one type. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
On June 17, 2002 04:57 am, Konrad Hinsen wrote:
If we assume that a slice returns a copy _always_, then I agree that #4 in your list above would not give a user what they would expect: array.view[0] would give the view of a copy of array[0], _not_ a view of array[0] which is probably what is wanted. I _think_ that this could be fixed by making view (or something similar) an option of the slice rather than a method of the object. For example (assuming that a is an array): Expression: Returns: Slicing Behavior: a or a[:] Copy of all of a Returns a copy of the sub-array a[0] Copy of a[0] Returns a copy of the sub-array a[:,view] View of all of a Returns a copy of the sub-array a[0,view] View of a[0] Returns a copy of the sub-array Notice that it is possible to return a copy of a sub-array from a view since you have access (through a pointer) to the original array data. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Konrad Hinsen wrote:
Let's make this explicit. Given the following four expressions,
I thought I had I clear idea of what I wanted here, which was the non-view stuff being the same as Python lists, but I discovered something: Python lists provide slices that are copies, but they are shallow copies, so nested lists, which are sort-of the equivalent of multidimensional arrays, act a lot like the view behavior of NumPy arrays: make a "2-d" list
make an array that is the same:
assign a new binding to the first element:
b = a[0] m = l[0]
change something in it:
The first array is changed Change something in the first element of the list:
The first list is changed too. Now try slices instead:
b = a[2:4]
change an element in the slice:
The first array is changed Now with the list
Change an element
l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]]
The list is changed, but: m[0] = [56,65]
l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]]
The list doesn't change, where:
The array does change My conclusion is that nested lists and Arrays simply are different beasts so we can't expect complete compatibility. I'm also wondering why lists have that weird behavior of a single index returning a reference, and a slice returning a copy. Perhaps it has something to so with the auto-resizing of lists. That being said, I still like the idea of slices producing copies, so:
2) array[0] An Array of rank one less than array, sharing data with array
4) array.view[0] Same as 2)
To add a few: 5) array[0:1] An Array with a copy of the data in array[0] 6) array.view[0:1] An Array sharing data with array As I write this, I am starting to think that this is all a bit strange. Even though lists treat slices and indexes differently, perhaps Arrays should not. They really are different beasts. I also see why it was done the way it was in the first place! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
Chris Barker <Chris.Barker@noaa.gov> writes:
This is not weird at all. Slicing and single item indexing are different conceptually and what I think you have in mind wouldn't really work. Think of a real life container, like box with subcompartments. Obviously you should be able to take out (or put in) an item from the box, which is what single indexing does (and the item may happen to be another box). My understanding is that you'd like the box to return copies of whatever was put into it on indexing, rather than the real thing -- this would not only be counterintuitive and inefficient, it also means that you could exclusively put items with a __copy__-method in lists, which would rather limit their usefulness. Slicing on the other hand creates a whole new box but this box is filled with (references to) the same items (a behavior for which a real life equivalent is more difficult to find :) :
Because the l and l2 are different boxes, however, assigning new items to l1 doesn't change l2 and vice versa. It is true, however that the situation is somewhat different for arrays, because "multidimensional" lists are just nested boxed, whereas multidimensional arrays have a different structure. array[1] indexes some part of itself according to its .shape (which can be modified, thus changing what array[1] indexes, without modifying the actual array contents in memory), whereas list[1] indexes some "real" object. This may mean that the best behavior for ``array[0]`` would be to return a copy and ``array[:]`` etc. what would be a "deep copy" if it where nested lists. I think this is the behavior Paul Dubois MA currently has.
No it is not possible.
I can't see why single-item indexing views would be needed at all if ``array[0]`` doesn't copy as you suggest above.
(I suppose you'd also want array[0:1] and array[0] to have different shape?)
Yes, arrays and lists are indeed different beasts and a different indexing behavior (creating copies) for arrays might well be preferable (since array indexing doesn't refer to "real" objects).
the way it was in the first place!
-Chris
alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Konrad Hinsen wrote:
Please don't!! Having two types of arrays around in a single program that have the same behaviour except when they are sliced is begging for confusion and hard to find bugs. I agree with Perry, that I occasionaly use the view behaviour of slicing, and it is very usefull when I do, but most of the time I would be happier with copy symantics. All I want is a way to get at a view of part of an array, I don't want two different kinds of array around with different slicing behaviour.
My main objection to changing the slicing behaviour is, like with some other proposed changes, compatibility.
The switch from Numeric to Numarray is a substantial change. I think we should view it like the mythical Py3k: an oportunity to make incompatible changes that will really make it better. By the way, as an old MATLAB user, I have to say that being able to get views from a slice is one behaviour of NumPy that I really appreciate, even though I only need it occasionally. MATLAB, howver is a whole different ball of wax in a lot of ways. There has been a lot of discussion about the copy on demand idea in MATLAB, but that is primarily useful because MATLAB has call by value function semantics, so without copy on demand, you would be making copies of large arrays passed to functions that weren't even going to change them. I don't think MATLAB impliments copy on demand for slices anyway, but I could be wrong there. Oh, and no function (ie ravel() ) should return a view in some cases, and a copy in others, that is just asking for bugs! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
I was going to write an almost identical email, but Chris saved me the trouble. These are my feelings as well. Scott On June 14, 2002 07:01 pm, Chris Barker wrote:
-- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
Rick White <rlw@stsci.edu> writes:
Sure, if one wants do perform only the *minimum* amount of copying, things can get rather tricky, but wouldn't it be satisfactory for most cases if attempted modification of the original triggered the delayed copying of the "views" (lazy copies)? In those cases were it isn't satisfactory the user could still explicitly create real (i.e. alias-only) views.
``b`` and ``c`` are copied and then ``a`` is deleted. What does numarray currently keep of a if I do something like the above or:
b = a.flat[::-10000] del a
?
Yes -- but only if the C extension is destructive. In that case the user might well be making a mistake in current Numeric if he has views and doesn't want them to be modified by the operation (of course he might know that the inplace operation does not affect the view(s) -- but wouldn't such cases be rather rare?). If he *does* want the views to be modified, he would obviously have to explictly specify them as such in a copy-on-demand scheme and in the other case he has been most likely been prevented from making an error (and can still explicitly use real views if he knows that the inplace operation on the original will not have undesired effects on the "views").
Sure, copy-on-demand is an optimization and optmizations always mess up things. On the other hand, some optimizations also make "nicer" (e.g. less error-prone) semantics computationally viable, so it's often a question between ease and clarity of the implementation vs. ease and clarity of code that uses it. I'm not denying that too much complexity in the implementation also aversely affects users in the form of bugs and that in the particular case of delayed copying the user can also be affected directly by more difficult to understand ressource usage behavior (e.g. a[0] = 1 triggering a monstrous copying operation). Just out of curiosity, has someone already asked the octave people how much trouble it has caused them to implement copy on demand and whether matlab/octave users in practice do experience difficulties because of the more harder to predict runtime behavior (I think, like matlab, octave does copy-on-demand)?
If the implementing copy-on-demand is too difficult and the resulting code would be too messy then this is certainly a valid reason to compromise on the current slicing behavior (especially since people like me who'd like to see copy-on-demand are unlikely to volunteer to implement it :)
alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
I'm not sure what you mean. Are you saying that if anything in the buffer changes, force all views of the buffer to generate copies (rather than try to determine if the change affected only selected views)? If so, yes, it is easier, but it still is a non-trivial capability to implement.
The whole buffer remains in both cases.
If the point is that views are susceptible to unexpected changes made in place by a C extension, yes, certainly (just as they are for changes made in place in Python). But I'm not sure what that has to do with the implied copy (even if delayed) being broken by extensions written in C. Promising a copy, and not honoring it is not the same as not promising it in the first place. But I may be misunderstanding your point. Perry
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Perry Greenfield" <perry@stsci.edu> writes:
Yes (I suspect that this will be be sufficient in practice).
views)? If so, yes, it is easier, but it still is a non-trivial capability to implement.
Sure. But since copy-on-demand is only an optimization and as such doesn't affect the semantics, it could also be implemented at a later point if the resources are currently not available. I have little doubt that someone will eventually add copy-on-demand, if the option is kept open and in the meantime one could still get all the performance (and alias behavior) of the current implementation by explicitly using ``.view`` (or ``.sub`` if you prefer) to create aliases. I'm becoming increasingly convinced (see below) that copy-slicing-semantics are much to be preferred as the default, so given the above I don't think that performance concerns should sway one towards alias-slicing, if enough people feel that copy semantics as such are preferable.
OK, so this is then a nice example where even eager copy slicing behavior would be *significantly* more efficient than the current aliasing behavior -- so copy-on-demand would then on the whole seem to be not just nearly equally but *more* efficient than alias slicing. And as far as difficult to understand runtime behavior is concerned, the extra ~100MB useless baggage carried around by b (second case) are, I'd venture to suspect, less than obvious to the casual observer. In fact I remember one of my fellow phd-students having significant problems with mysterious memory consumption (a couple of arrays taking up more than 1GB rather than a few hundred MB) -- maybe something like the above was involved. That ``A = A[::-1]`` doesn't work (as pointed out by Paul Barrett) will also come as a surprise to most people. If I understand all this correctly, I consider it a rather strong case against alias slicing as default behavior.
OK, I'll try again, hopefully this is clearer. In a sentence: I don't see any problems with C extensions in particular that would arise from copy-on-demand (I might well be overlooking something, though). Rick was saying that passing an array to a C extension that performs an inplace operation on it means that all copies of all its (lazy) views must be performed. My point was that this is correct, but I can't see any problem with that, neither from the point of extension writer, nor from the point of performance nor from the point of the user, nor indeed from the point of the numarray implementors (obviously the copy-on-demand scheme *as such* will be an effort). All that is needed is a separate interface for (the minority of) C extensions that destructively modify their arguments (they only need to call some function `actualize_views(the_array_or_view)` or whatever at the start -- this function will obviously be necessary regardless of the C extensions). So nothing will break, the promises are kept and no extra work. It won't be any slower than what would happen with current Numeric, either, because either the (Numeric) user intended his (aliased) views to modified as well or it was a bug. If he intended the views to be modified, he would explicitly use alias-views under the new scheme and everything would behave exactly the same. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
participants (11)
-
Alexander Schmolck
-
Chris Barker
-
eric jones
-
John J. Lee
-
Konrad Hinsen
-
Paul F Dubois
-
Perry Greenfield
-
Rick White
-
Scott Ransom
-
Tim Hochberg
-
Travis Oliphant