Re: [Numpy-discussion] copy on demand
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
Wouldn't an (almost) automatic solution be to simply replace (almost) all instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual
That would convert all slicing operations, even those working on strings, lists, and user-defined sequence-type objects.
cases (like if you heavily mix arrays and lists) you could still
I do, and I don't consider it that unusual. Anyway, even if some function gets called only with array arguments, I don't see how a code analyzer could detect that. So it would be...
autoconvert by inserting ``if type(foo) == ArrayType:...``, although
typechecks for every slicing or indexing operation (a[0] generates a view as well for a multidimensional array). Guaranteed to render most code unreadable, and of course slow down execution. A further challenge for your code convertor: f(a[0], b[2:3], c[-1, 1]) That makes eight type combination cases.
UserList is not an independent type, it is merely a subclassable wrapper around lists. As for the array module, I haven't seen any code that uses it.
I would suppose that in the grand scheme of things numarray.array is intended as an eventual replacement for array.array, or not?
In the interest of those who rely on the current array module, I hope not.
I agree - except that I think it is already too late.
Yes, assuming that views are somehow available. But my preference is not so strong that I consider it a sufficient reason to break lots of code. View semantics is not a catastrophe. All of us continue to use NumPy in spite of it, and I suspect none of use loses any sleep over it. I have spent perhaps a few hours in total (over six years of using NumPy) to track down view-related bugs, which makes it a minor problem on my personal scale.
I don't think matlab or similar alternatives make legally binding promises about backwards compatibility, or do they? It guess it is actually more
Of course not, software providers for the mass market take great care not to promise anything. But if Matlab did anything as drastic as what we are discussing, they would loose lots of paying customers.
In what way does the current slicing behaviour render your code non-competitive?
like the balance python strikes here so far -- the language has
Me too. But there haven't been any incompatible changes in the documented core language, and only very few in the standard library (the to-be-abandoned re module comes to mind - anything else?). For a bad example, see the Python XML package(s). Lots of changes, incompatibilities between parsers, etc. The one decision I really regret is to have chosen an XML-based solution for documentation. Now I spend two days at every new release of my stuff to adapt the XML code to the fashion of the day. It is almost ironic that I appear here as the great anti-change advocate, since in many other occasions I have argued for improvement over excessive compatiblity. Basically I favour motivated incompatible changes, but under the condition that updating of existing code is manageable. Changing the semantics of a type is about the worst I can imagine in this respect. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
[sorry for replying so late, an almost finished email got lost in a computer accident and I was rather busy.] Konrad Hinsen <hinsen@cnrs-orleans.fr> writes:
Well that's where the "(almost)" comes in ;) If you can tell at glance for most instances in you code whether the ``foo`` in ``foo[a:b]`` is an array, then running a query replace isn't that much trouble. Of course this might not be true. But the question really is: to what extent would it be more difficult to tell than what you need to find out already in all the other situations where code needs changing because of the incompatibilities numarray already introduces? (I think I have for example already found a slicing-incompatibility -- unfortunately the list of the issues I hit upon so far has disappeared somewhere, so I'll have to try to reconstruct it sometime...) If the answer is "not much", then you would have to regard these incompatibilities as even less acceptable than the introduction of copy-slicing semantics (because as you've already agreed, these incompatibilities don't confer the same benefit) or otherwise it would be difficult to see why copy-slicing shouldn't be introduced as well (just as an example, I'm sure I've already come across a slicing incompatibility -- unfortunately I've lost my compilation of this and similar problems, but I'll try to reconstruct it). View semantics have always bothered me, but if it weren't for the fact that numarray is going to cause me not inconsiderable inconvenience through various incompatibilities anyway, I would have been satisfied with the status quo. As things are, however I must admit I feel a strong temptation to get this fixed as well, especially as most of the other laudable improvements of numarray wouldn't seem to be of great importance to me personally at the moment (much nicer C code base, better handling of byteswapped data and very large arrays etc.). So I fully admit to a selfish desire for either more gain or less pain (incompatibility) or maybe even a bit of both. Of course I don't think these subjective desires of mine are a good standard to go by, but I am convinced that offering attractive improvements or few compatibility problems (or both) to the widest possible audience of current Numeric users is important in order to replace Numeric, quickly and cleanly, without any splitting.
I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This was mainly intended as a demonstration that you *can* do it automatically, if you really need to. A function call would help the readability but obviously be even more inefficient. If I really had large amounts of code that needed that conversion, I'd be tempted to write such a function with an additional twist: have it monitor the input argument type whenever the program is run and if it isn't an array, the wrapping in this particular line can be discarded (with less confidence, if it always seems to be an array it could be converted into ``a.view[b:c]``, but that might need additional checking). In code that isn't reached, the wrapper just stays forever. I've always been looking for an excuse to write some self-modifying code :)
It is AFAIK the only way to work efficiently with large strings, so I guess it is important also I agree that it is not that often used.
As long as array is kept around for backwards-compatibility, why not? [...]
A single design decision obviously doesn't have such an immediate huge negative impact that it immediately renders all your code-noncompetive, unless it was a *really* bad design decision it just means more bugs and less clear and general code. But language warts are more like tumours, they grow over the years and become increasingly difficult to excise (just look what tremendous redesign effort the perl people go through at the moment). The closer warts come to the core language the worse, and since numarray aims for inclusion I think it must be measured to a higher standard than other modules that don't.
I don't think this is true (and the documented core language is not necessarily a good standard to go by as far as python is concerned, because not quite everything one has to rely upon is actually documented (instead one can find things like: "XXX Can't be bothered to spell this out right now...")). Among the incompatible changes that I would strongly assume *were* documented before and after are: exceptions (strings -> classes), automatic conversion of ints to longs (instead of an exception) and the new division rules whose stepwise introduction has already started. There are also quite a few things that used to work for all classes, but that now no longer work with new-style classes, some of which can be quite annoying (you loose quite a bit of introspective and interactive power), but I'm not sure to which extent they were documented.
I didn't do much xml processing, but as far as I can remember I was happy with 4suite: http://4suite.org/index.xhtml.
I don't think a particularly conservative character is necessary to fill that role :) You've got a big code base, which automatically reduces the desire for incompatibilities because you have to pay a hefty cost that is difficult to offset by potential advantages for future code. But that side of the argument is clearly important and I think even if you don't like to be an anti-change advocate you still often make valuable points against changes you perceive as uncalled for. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
If you can tell at glance for most instances in you code whether the ``foo`` in ``foo[a:b]`` is an array, then running a query replace isn't that much
How could I? Moreover, even if I could, that's not enough. I need a program to spot those places for me, as I won't go through 30000 lines of code by hand.
What are those? In general, changes related to NumPy functions or attributes of array objects are relatively easy to deal with, as one can use a text editor to search for the name and thereby capture most locations (not all though). Changes related to generic operatinos that many other types share are the worst.
If the answer is "not much", then you would have to regard these
I am not aware of any other incompatibility in the "worst" category. If there is one, I will probably never use Numarray.
c[-1,1] can't be a list, but it needn't be an array. Any class can implement multiple-dimension indexing. My netCDF array objects do, for example.
I have large amounts of code that would need conversion. However, it is code that myself and about 100 other users rely on for their daily work, so it won't be the subject of empirical fixing of any kind. Either there will be an automatic procedure that is guaranteed to keep the code working, or there won't be any update.
I don't see any evidence for this in NumPy.
now...")). Among the incompatible changes that I would strongly assume *were* documented before and after are: exceptions (strings -> classes), automatic
String exceptions still work. I am not aware of any code that was broken by the fact that the standard exceptions are now classes.
conversion of ints to longs (instead of an exception) and the new division rules whose stepwise introduction has already started. There are also quite a
The division rules are the only case of serious incompatibilities I know of, and I am in fact against them; although I agree that the proposed new rules are much better. On the other hand, the proposed transition procedure provides much more help for updating code than we would get from Numarray. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
Hi Konrad, On Sun, Jun 23, 2002 at 10:20:35AM +0200, Konrad Hinsen wrote:
I think you are painting an overly bleak picture -- and one that is certainly more black and white than reality. I am one of those 100 users and I would (will) certainly go through the code that I use on a daily basis (and the other code that I use less frequently) -- just as I have every time there is an update to the Python core or your code. Hell, some of those 30000 line of "your" code are actually _my_ code. And out of those 100 other users, I'd be willing to bet a beer or three that at least a couple would help to track down incompatibilities as well. Many (perhaps even most) of the problems will be able to be spotted by simply running the test codes provided with the individual modules. By generously releasing your code, you have made it possible for your code to become part of my -- and many others -- "standard library". And it is a part that I don't want to get rid of. I truly hope that this incompatibility (i.e. copy vs view) and the time that it will take to update older code will not cause many potentially beneficial (or at least requested) features/changes to be dropped. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
I certainly appreciate any help, but this is not just a matter of amount of time, but also of risk, the risk of introducing bugs. The package that you are using, Scientific Python, is the lesser of my worries, as the individual parts are very independent. My other package, MMTK, is not only bigger, but also consists of many tightly coupled modules. Moreover, I am not aware of any user except for myself who knows the code well enough to be able to work on such an update project. Finally, this is not just my personal problem, there is lots of NumPy code out there, publically released or not, whose developers would face the same difficulties. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
[sorry for replying so late, an almost finished email got lost in a computer accident and I was rather busy.] Konrad Hinsen <hinsen@cnrs-orleans.fr> writes:
Well that's where the "(almost)" comes in ;) If you can tell at glance for most instances in you code whether the ``foo`` in ``foo[a:b]`` is an array, then running a query replace isn't that much trouble. Of course this might not be true. But the question really is: to what extent would it be more difficult to tell than what you need to find out already in all the other situations where code needs changing because of the incompatibilities numarray already introduces? (I think I have for example already found a slicing-incompatibility -- unfortunately the list of the issues I hit upon so far has disappeared somewhere, so I'll have to try to reconstruct it sometime...) If the answer is "not much", then you would have to regard these incompatibilities as even less acceptable than the introduction of copy-slicing semantics (because as you've already agreed, these incompatibilities don't confer the same benefit) or otherwise it would be difficult to see why copy-slicing shouldn't be introduced as well (just as an example, I'm sure I've already come across a slicing incompatibility -- unfortunately I've lost my compilation of this and similar problems, but I'll try to reconstruct it). View semantics have always bothered me, but if it weren't for the fact that numarray is going to cause me not inconsiderable inconvenience through various incompatibilities anyway, I would have been satisfied with the status quo. As things are, however I must admit I feel a strong temptation to get this fixed as well, especially as most of the other laudable improvements of numarray wouldn't seem to be of great importance to me personally at the moment (much nicer C code base, better handling of byteswapped data and very large arrays etc.). So I fully admit to a selfish desire for either more gain or less pain (incompatibility) or maybe even a bit of both. Of course I don't think these subjective desires of mine are a good standard to go by, but I am convinced that offering attractive improvements or few compatibility problems (or both) to the widest possible audience of current Numeric users is important in order to replace Numeric, quickly and cleanly, without any splitting.
I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This was mainly intended as a demonstration that you *can* do it automatically, if you really need to. A function call would help the readability but obviously be even more inefficient. If I really had large amounts of code that needed that conversion, I'd be tempted to write such a function with an additional twist: have it monitor the input argument type whenever the program is run and if it isn't an array, the wrapping in this particular line can be discarded (with less confidence, if it always seems to be an array it could be converted into ``a.view[b:c]``, but that might need additional checking). In code that isn't reached, the wrapper just stays forever. I've always been looking for an excuse to write some self-modifying code :)
It is AFAIK the only way to work efficiently with large strings, so I guess it is important also I agree that it is not that often used.
As long as array is kept around for backwards-compatibility, why not? [...]
A single design decision obviously doesn't have such an immediate huge negative impact that it immediately renders all your code-noncompetive, unless it was a *really* bad design decision it just means more bugs and less clear and general code. But language warts are more like tumours, they grow over the years and become increasingly difficult to excise (just look what tremendous redesign effort the perl people go through at the moment). The closer warts come to the core language the worse, and since numarray aims for inclusion I think it must be measured to a higher standard than other modules that don't.
I don't think this is true (and the documented core language is not necessarily a good standard to go by as far as python is concerned, because not quite everything one has to rely upon is actually documented (instead one can find things like: "XXX Can't be bothered to spell this out right now...")). Among the incompatible changes that I would strongly assume *were* documented before and after are: exceptions (strings -> classes), automatic conversion of ints to longs (instead of an exception) and the new division rules whose stepwise introduction has already started. There are also quite a few things that used to work for all classes, but that now no longer work with new-style classes, some of which can be quite annoying (you loose quite a bit of introspective and interactive power), but I'm not sure to which extent they were documented.
I didn't do much xml processing, but as far as I can remember I was happy with 4suite: http://4suite.org/index.xhtml.
I don't think a particularly conservative character is necessary to fill that role :) You've got a big code base, which automatically reduces the desire for incompatibilities because you have to pay a hefty cost that is difficult to offset by potential advantages for future code. But that side of the argument is clearly important and I think even if you don't like to be an anti-change advocate you still often make valuable points against changes you perceive as uncalled for. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
If you can tell at glance for most instances in you code whether the ``foo`` in ``foo[a:b]`` is an array, then running a query replace isn't that much
How could I? Moreover, even if I could, that's not enough. I need a program to spot those places for me, as I won't go through 30000 lines of code by hand.
What are those? In general, changes related to NumPy functions or attributes of array objects are relatively easy to deal with, as one can use a text editor to search for the name and thereby capture most locations (not all though). Changes related to generic operatinos that many other types share are the worst.
If the answer is "not much", then you would have to regard these
I am not aware of any other incompatibility in the "worst" category. If there is one, I will probably never use Numarray.
c[-1,1] can't be a list, but it needn't be an array. Any class can implement multiple-dimension indexing. My netCDF array objects do, for example.
I have large amounts of code that would need conversion. However, it is code that myself and about 100 other users rely on for their daily work, so it won't be the subject of empirical fixing of any kind. Either there will be an automatic procedure that is guaranteed to keep the code working, or there won't be any update.
I don't see any evidence for this in NumPy.
now...")). Among the incompatible changes that I would strongly assume *were* documented before and after are: exceptions (strings -> classes), automatic
String exceptions still work. I am not aware of any code that was broken by the fact that the standard exceptions are now classes.
conversion of ints to longs (instead of an exception) and the new division rules whose stepwise introduction has already started. There are also quite a
The division rules are the only case of serious incompatibilities I know of, and I am in fact against them; although I agree that the proposed new rules are much better. On the other hand, the proposed transition procedure provides much more help for updating code than we would get from Numarray. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/4de7d92b333c8b0124e6757d757560b0.jpg?s=120&d=mm&r=g)
Hi Konrad, On Sun, Jun 23, 2002 at 10:20:35AM +0200, Konrad Hinsen wrote:
I think you are painting an overly bleak picture -- and one that is certainly more black and white than reality. I am one of those 100 users and I would (will) certainly go through the code that I use on a daily basis (and the other code that I use less frequently) -- just as I have every time there is an update to the Python core or your code. Hell, some of those 30000 line of "your" code are actually _my_ code. And out of those 100 other users, I'd be willing to bet a beer or three that at least a couple would help to track down incompatibilities as well. Many (perhaps even most) of the problems will be able to be spotted by simply running the test codes provided with the individual modules. By generously releasing your code, you have made it possible for your code to become part of my -- and many others -- "standard library". And it is a part that I don't want to get rid of. I truly hope that this incompatibility (i.e. copy vs view) and the time that it will take to update older code will not cause many potentially beneficial (or at least requested) features/changes to be dropped. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
![](https://secure.gravatar.com/avatar/a53ea657e812241a1162060860f698c4.jpg?s=120&d=mm&r=g)
I certainly appreciate any help, but this is not just a matter of amount of time, but also of risk, the risk of introducing bugs. The package that you are using, Scientific Python, is the lesser of my worries, as the individual parts are very independent. My other package, MMTK, is not only bigger, but also consists of many tightly coupled modules. Moreover, I am not aware of any user except for myself who knows the code well enough to be able to work on such an update project. Finally, this is not just my personal problem, there is lots of NumPy code out there, publically released or not, whose developers would face the same difficulties. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
participants (3)
-
Alexander Schmolck
-
Konrad Hinsen
-
Scott Ransom