[Numpy-discussion] copy on demand

Fri Jun 21 16:42:01 EDT 2002

[sorry for replying so late, an almost finished email got lost in a computer
accident and I was rather busy.]

Konrad Hinsen <hinsen at cnrs-orleans.fr> writes:

> > Wouldn't an (almost) automatic solution be to simply replace (almost) all
> > instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual
> 
> That would convert all slicing operations, even those working on
> strings, lists, and user-defined sequence-type objects.

Well that's where the "(almost)" comes in ;)

If you can tell at glance for most instances in you code whether the ``foo``
in ``foo[a:b]`` is an array, then running a query replace isn't that much
trouble. Of course this might not be true. But the question really is: to what
extent would it be more difficult to tell than what you need to find out
already in all the other situations where code needs changing because of the
incompatibilities numarray already introduces? (I think I have for example
already found a slicing-incompatibility -- unfortunately the list of the
issues I hit upon so far has disappeared somewhere, so I'll have to try to
reconstruct it sometime...)

If the answer is "not much", then you would have to regard these
incompatibilities as even less acceptable than the introduction of
copy-slicing semantics (because as you've already agreed, these
incompatibilities don't confer the same benefit) or otherwise it would be
difficult to see why copy-slicing shouldn't be introduced as well (just as an
example, I'm sure I've already come across a slicing incompatibility --
unfortunately I've lost my compilation of this and similar problems, but I'll
try to reconstruct it).

View semantics have always bothered me, but if it weren't for the fact that
numarray is going to cause me not inconsiderable inconvenience through various
incompatibilities anyway, I would have been satisfied with the status quo. As
things are, however I must admit I feel a strong temptation to get this fixed
as well, especially as most of the other laudable improvements of numarray
wouldn't seem to be of great importance to me personally at the moment (much
nicer C code base, better handling of byteswapped data and very large arrays
etc.). So I fully admit to a selfish desire for either more gain or less pain
(incompatibility) or maybe even a bit of both. Of course I don't think these
subjective desires of mine are a good standard to go by, but I am convinced
that offering attractive improvements or few compatibility problems (or both)
to the widest possible audience of current Numeric users is important in order
to replace Numeric, quickly and cleanly, without any splitting.

> 
> > autoconvert by inserting ``if type(foo) == ArrayType:...``, although
> 
> typechecks for every slicing or indexing operation (a[0] generates a
> view as well for a multidimensional array). Guaranteed to render most
> code unreadable, and of course slow down execution.
> 
> A further challenge for your code convertor:
> 
>     f(a[0], b[2:3], c[-1, 1])
> 
> That makes eight type combination cases.

I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This
was mainly intended as a demonstration that you *can* do it automatically, if
you really need to. A function call would help the readability but obviously
be even more inefficient. If I really had large amounts of code that needed
that conversion, I'd be tempted to write such a function with an additional
twist: have it monitor the input argument type whenever the program is run and
if it isn't an array, the wrapping in this particular line can be discarded
(with less confidence, if it always seems to be an array it could be converted
into ``a.view[b:c]``, but that might need additional checking). In code that
isn't reached, the wrapper just stays forever. I've always been looking for an
excuse to write some self-modifying code :)

> 
> > Well, AFAIK there are actually three mutable sequence types in
> > python core and all have copy-slicing behavior: list, UserList and
> > array:
> 
> UserList is not an independent type, it is merely a subclassable
> wrapper around lists. As for the array module, I haven't seen any code
> that uses it.

It is AFAIK the only way to work efficiently with large strings, so I guess it
is important also I agree that it is not that often used.

> 
> > I would suppose that in the grand scheme of things numarray.array is intended
> > as an eventual replacement for array.array, or not?
> 
> In the interest of those who rely on the current array module, I hope not.

As long as array is kept around for backwards-compatibility, why not?

[...]
> > But reliability to me also includes the ability for growth -- I not only want
> > my old code to work in a couple of years, I also want the tool I wrote it in
> > to remain competitive and this can conflict with backwards-compatibility. I
> 
> In what way does the current slicing behaviour render your code
> non-competitive?

A single design decision obviously doesn't have such an immediate huge
negative impact that it immediately renders all your code-noncompetive, unless
it was a *really* bad design decision it just means more bugs and less clear
and general code. But language warts are more like tumours, they grow over the
years and become increasingly difficult to excise (just look what tremendous
redesign effort the perl people go through at the moment). The closer warts
come to the core language the worse, and since numarray aims for inclusion I
think it must be measured to a higher standard than other modules that don't.

> 
> > like the balance python strikes here so far -- the language has
> 
> Me too. But there haven't been any incompatible changes in the
> documented core language, and only very few in the standard library
> (the to-be-abandoned re module comes to mind - anything else?).

I don't think this is true (and the documented core language is not
necessarily a good standard to go by as far as python is concerned, because
not quite everything one has to rely upon is actually documented (instead one
can find things like: "XXX Can't be bothered to spell this out right
now...")). Among the incompatible changes that I would strongly assume *were*
documented before and after are: exceptions (strings -> classes), automatic
conversion of ints to longs (instead of an exception) and the new division
rules whose stepwise introduction has already started. There are also quite a
few things that used to work for all classes, but that now no longer work with
new-style classes, some of which can be quite annoying (you loose quite a bit
of introspective and interactive power), but I'm not sure to which extent they
were documented.

> 
> For a bad example, see the Python XML package(s). Lots of changes,
> incompatibilities between parsers, etc. The one decision I really
> regret is to have chosen an XML-based solution for documentation. Now
> I spend two days at every new release of my stuff to adapt the XML
> code to the fashion of the day.

I didn't do much xml processing, but as far as I can remember I was happy with
4suite: http://4suite.org/index.xhtml.

> 
> It is almost ironic that I appear here as the great anti-change
> advocate, since in many other occasions I have argued for improvement
> over excessive compatiblity. Basically I favour motivated incompatible

I don't think a particularly conservative character is necessary to fill that
role :) You've got a big code base, which automatically reduces the desire for
incompatibilities because you have to pay a hefty cost that is difficult to
offset by potential advantages for future code. But that side of the argument
is clearly important and I think even if you don't like to be an anti-change
advocate you still often make valuable points against changes you perceive as
uncalled for.

alex

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/