[Numpy-discussion] Response to PEP suggestions

Thu Feb 17 13:32:19 EST 2005

Travis Oliphant <oliphant at ee.byu.edu> writes:

> I'm glad to get the feedback.
>
> 1) Types
>
> I like Francesc's suggestion that .typecode return a code and .type
> return a Python class.   What is the attitude and opinion regarding
> the use of attributes or methods for
> this kind of thing?  It always seems to me so arbitrary as to what is
> an attribute or what
> is a method.

If it's an intrinisic attribute (heh) of the object, I usually try to
make it an attribute. So I'd make these attributes.

> There will definitely be support for the nummary-style type
> specification.   Something like that will be how they print (I like
> the 'i4', 'f4', specification a bit better though). There will also be
> support for specification in terms of a c-type.  The typecodes will
> still be there, underneath.

+1. I think labelling types with their sizes at some level is necessary
for cross-platform compatibility (more below).

> One thing has always bothered me though.  Why is a double complex type
> Complex64? and a float complex type Complex32.  This seems to break
> the idea that the number at the end specifies a bit width.   Why don't
> we just call it Complex64 and Complex128?  Can we change this?

Or rename to ComplexFloat32 and ComplexFloat64?

> I'm also glad that some recognize the problems with always requiring
> specification of types in terms of bit-width or byte-widths as these
> are not the same across platforms.  For some types (like Int8 or
> Int16) this is not a problem.   But what about long double?  On an
> intel machine long double is Float96 while on a PowerPC it is
> Float128.   Wouldn't it just be easier to specify LDouble or 'g' then
> special-case your code?

One problem to consider (and where I first ran into these type of
things) is when pickling. A pickle containing an array of Int isn't
portable, if the two machines have a different idea of what an Int is
(Int32 or Int64, for instance). Another reason to keep the byte-width.

LDouble, for instance, should probably be an alias to Float96 on
Intel, and Float128 on PPC, and pickle accordingly.

> Problems also exist when you are interfacing with hardware or other C
> or Fortran code.  You know you want single-precision floating point.
> You don't know or care what the bit-width is.    I think with the
> Integer types the bit-width specification is more important than
> floating point types.  In sum, I think it is important to have the
> ability to specify it both ways.   When printing the array, it's
> probably better if it gives bit-width information.  I like the way
> numarray prints arrays.

Do you mean adding bit-width info to str()? repr() definitely needs
it, and it should be included in all cases, I think.

You also run into that sizeof(Python integer) isn't necessarily
sizeof(C int) (a Python int being a C long), espically on 64-bit systems.

I come from a C background, so things like Float64, etc., look wrong.
I think more in terms of single- and double-precision, so I think
adding some more descriptive types:

CInt         (would be either Int32 or Int64, depending on the platform)
CFloat       (can't do Float, for backwards-compatibility reasons)
CDouble      (could just be Double)
CLong        (or Long)
CLongLong    (or LongLong)

That could make it easier to match types in Python code to types in C
extensions.

Oh, and the Python types int and float should be allowed (especially
if you want this to go in the core!).

And a Fortran integer could be something else, but I think that's
more of a SciPy problem than Numeric or numarray. It could add
FInteger and FBoolean, for instance.

> 2) Multidimensional array indexing.
>
> Sometimes it is useful to select out of an array some elements based
> on it's linear (flattened) index in the array.   MATLAB, for example,
> will allow you to take a three-dimensional array and index it with a
> single integer based on it's Fortran-order:  x(1,1,1),  x(2,1,1), ...
>
> What I'm proposing would have X[K] essentially equivalent to
> X.flat[K].  The problem with always requiring the use of X.flat[K] is
> that X.flat does not work for discontiguous arrays.   It could be made
> to work if X.flat returned some kind of specially-marked array, which
> would then have to be checked every time indexing occurred for any
> array.  Or, there maybe someway to have X.flat return an "indexable
> iterator" for X which may be a more Pythonic thing to do anyway.  That
> could solve the problem and solve the discontiguous X.flat problem as
> well.
>
> If we can make X.flat[K] work for discontiguous arrays, then I would
> be very happy to not special-case the single index array but always
> treat it as a 1-tuple of integer index arrays.

Right now, I find X.flat to be pretty useless, as you need a
contiguous array. I'm +1 on making X.flat work in all cases (contiguous
and discontiguous). Either

a) X.flat returns a contiguous 1-dimensional array (like ravel(X)),
   which may be a copy of X

or

b) X.flat returns a "flat-indexable" view of X

I'd argue for b), as I feel that attributes should operate as views,
not as potential copies. To me, attributes "feel like" they do no
work, so making a copy by mere dereferencing would be suprising.

If a), I'd rather flat() be a method (or have a ravel() method).

I think overloading X[K] starts to run into trouble: too many special
cases.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca