[Matrix-SIG] An Experiment in code-cleanup.

Andrew P. Mullhaupt amullhau@zen-pharaohs.com
Wed, 9 Feb 2000 01:51:09 -0500

> I'd be perfectly happy to have it implemented as a
> method. Indexing would remain as it is (by reference), and a new
> method would provide copying behaviour for element extraction and also
> permit more generalized sequence indices.

I think I can live with that, as long as it _syntactically_ looks like
indexing. This is one case where the syntax is more important than
functionality. There are things you want to index with indices, etc., and
the composition with parenthesis-like (Dyck language) syntax has proved to
be one of the few readable ways to do it.

> In addition to backwards compatibility, there is another argument for
> keeping indexing behaviour as it is: compatibility with other Python
> sequence types. If you have a list of lists, which in many ways
> behaves like a 2D array, and extract the third element (which is thus
> a list), then this data is shared with the full nested list.

_Avoiding_ data sharing will eventually be more important that supporting
data sharing since memory continues to get cheaper but memory bandwidth and
latency do not improve at the same rate. Locality of reference is hard to
control when there is a lot of default data sharing, and performance
suffers, yet it becomes important on more and more scales as memory systems
become more and more hierarchical.

Ultimately, the _semantics_ we like will be implemented efficiently by
emulating references and copies in code which copies and references as it
sees fit and keeps track of which copies are "really" references and which
references are really "copies". I've thought this through for the
"everything gets copied" languages and it isn't too mentally distressing -
you simply reference count fake copies. The "everything is a reference"
languages are less clean, but the database people have confronted that

> Which reminds me: rank-0 arrays are also incompatible with the
> nested-list view of arrays.

There are ways out of that trap. Most post-ISO APLs provide examples of how
to cope.

> > I know there are several opinions, so I'll offer mine.  We need
> > simple rules that are easy to teach a newcomer.  Right now the rule is
> > farily simple in that coercion always proceeds up.  But, mixed
> Like in Python (for scalars), C, Fortran, and all other languages that
> I can think of.

And that is not a bad thing. But which way is "up"? (See example below.)

> > Konrad, 4 years ago, you talked about unexpected losses of precision if
> > this were allowed to happen, but I couldn't understand how.  To me, it
> > unexpected to have double precision arrays which are really only
> > carrying single-precision results.

Most people always hate, and only sometimes detect, when that happens. It
specifically contravenes the Geneva conventions on programming mental

> The upcasting rule thus ensures that
> 1) No precision is lost accidentally.

More or less.

More precisely, it depends on what you call an accident. What happens when
you add the IEEE single precision floating point value 1.0 to the 32-bit
integer 2^30? A _lot_ of people don't expect to get the IEEE single
precision floating point value 2.0^30, but that is what happens in some
languages. Is that an "upcast"? Would the 32 bit integer 2^30 make more
sense? Now what about the case where the 32 bit integer is signed and adding
one to it will "wrap around" if the value remains an integer? Because these
two examples might make double precision or a wider integer (if available)
seem the correct answer, suppose it's only one element of a gigantic array?
Let's now talk about complex values....

There's plenty of rough edges like this when you mix numerical types. It's
guaranteed that everybody's ox will get gored somewhere.

> 2) No overflow occurs unless it is unavoidable (the range problem).
> > The casual user will never use single precision arrays and so will not
> > even notice they are there unless they request them.   If they do
> There are many ways in which single-precision arrays can creep into a
> program without a user's attention.


> Suppose you send me some data in a
> pickled array, which happens to be single-precision. Or I call a
> library routine that does some internal calculation on huge data
> arrays, which it keeps at single precision, and (intentionally or by
> error) returns a single-precision result.

And the worst one is when the accuracy of the result is single precision,
but the _type_ of the result is double precision. There is a function in
S-plus which does this (without documenting it, of course) and man was that
a pain in the neck to sort out.

Today, I just found another bug in one of the S-plus functions - turns out
that that if you hand a complex triangular matrix and a real right hand side
to the triangular solver (backsubstitution) it doesn't cast the right hand
side to complex and uses whatever values are subsequent in memoty to the
right hand side as if they were part of the vector. Obviously, when testing
the function, they didn't try this mixed type case. But interpreters are
really convenient for writing code so that you _don't_ have to think about
types all the time and do your own casting. Which is why stubbing your head
on an unexpected cast is so unlooked for.

> I think your "active flag" solution is a rather good solution to the
> casting problem, because it gives access to a different behaviour in a
> very explicit way. So unless future experience points out problems,
> I'd propose to keep it.

Is there a simple way to ensure that no active arrays are ever activated at
any time when I use Numerical Python?

Andrew Mullhaupt