Someone (way to go Rory!) recently posted a patch (woohoo!) for numarray which I think bears a little discussion since it involves the re-write of a fundamental numarray function: array(). The patch fixes a number of bugs and deconvolutes the logic of array(). The patch is here if you want to look at it yourself: http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse One item I thought needed some discussion was the removal of two features:
* array() does too much. E.g., handling file/memory instances for 'sequence'. There's fromfile for the former, and users needing the latter functionality should be clued up enough to instantiate NumArray directly.
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them? I think strings.py and records.py also have "over-stuffed" array() functions... so consistency bids us to streamline those as well. Regards, Todd
On 12 Jan 2005 19:35:36 -0500, Todd Miller <jmiller@stsci.edu> wrote:
One item I thought needed some discussion was the removal of two features:
* array() does too much. E.g., handling file/memory instances for 'sequence'. There's fromfile for the former, and users needing the latter functionality should be clued up enough to instantiate NumArray directly.
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
IMHO, array should just delegate to other functions based on the arguments, then it can remain backward compatible. I use the from buffer functionality quite often and it would be nice if there would at least be a new function frombuffer or frommemory. Regards, Florian Schulze
A Dijous 13 Gener 2005 01:35, Todd Miller va escriure:
* array() does too much. E.g., handling file/memory instances for 'sequence'. There's fromfile for the former, and users needing the latter functionality should be clued up enough to instantiate NumArray directly.
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
For me is fine. I always call the array() factory function in order to get a buffer object, so no problem.
I think strings.py and records.py also have "over-stuffed" array() functions... so consistency bids us to streamline those as well.
I agree. Cheers, --
OO< Francesc Altet || http://www.carabos.com/ V V Carabos Coop. V. || Who is your data daddy? PyTables ""
Todd Miller wrote:
Someone (way to go Rory!) recently posted a patch (woohoo!) for numarray which I think bears a little discussion since it involves the re-write of a fundamental numarray function: array(). The patch fixes a number of bugs and deconvolutes the logic of array().
The patch is here if you want to look at it yourself:
http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse
One item I thought needed some discussion was the removal of two features:
* array() does too much. E.g., handling file/memory instances for 'sequence'. There's fromfile for the former, and users needing the latter functionality should be clued up enough to instantiate NumArray directly.
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help. Currently, neither the word "class" or "NumArray" appear in the doc index. Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type? As a compromise, either could be accepted as a NumArray.__init__ argument, since it is easy to distinguish between them. It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances? Is the function asarray redundant? I suggest that the copy parameter be of the BoolType. This probably has no practical impact but it is consistent with current Python usage and makes it clear that this is a Yes/No parameter, rather than specifying a number of copies.
I think strings.py and records.py also have "over-stuffed" array() functions... so consistency bids us to streamline those as well.
Regards, Todd
Thanks to Rory for initiating this. Colin W.
On Thu, 2005-01-13 at 10:26 -0500, Colin J. Williams wrote:
Todd Miller wrote:
Someone (way to go Rory!) recently posted a patch (woohoo!) for numarray which I think bears a little discussion since it involves the re-write of a fundamental numarray function: array(). The patch fixes a number of bugs and deconvolutes the logic of array().
The patch is here if you want to look at it yourself:
http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse
One item I thought needed some discussion was the removal of two features:
* array() does too much. E.g., handling file/memory instances for 'sequence'. There's fromfile for the former, and users needing the latter functionality should be clued up enough to instantiate NumArray directly.
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help.
That's fair. The docstring for NumArray needs beefing up along the same lines as Rory's work on array(). I initially liked Florian's idea of frombuffer() but since I can't think of how it's not identical to NumArray(), I'm not sure there's any point.
Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type?
Some people like type. I don't want to touch this.
It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances?
The patch has a new docstring which spells out the array() construction algorithm. Lists of arrays would be seen as "numerical sequences".
Is the function asarray redundant?
Yes, but it's clear and also needed for backward compatibility with Numeric. Besides, it's not just redundant, it's an idiom...
I suggest that the copy parameter be of the BoolType. This probably has no practical impact but it is consistent with current Python usage and makes it clear that this is a Yes/No parameter, rather than specifying a number of copies.
Fair enough. Backward compatibility dictates not *requiring* a bool, but using it as a default is fine.
Todd Miller wrote:
On Thu, 2005-01-13 at 10:26 -0500, Colin J. Williams wrote:
Todd Miller wrote:
Someone (way to go Rory!) recently posted a patch (woohoo!) for numarray which I think bears a little discussion since it involves the re-write of a fundamental numarray function: array(). The patch fixes a number of bugs and deconvolutes the logic of array().
The patch is here if you want to look at it yourself:
http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse
One item I thought needed some discussion was the removal of two features:
* array() does too much. E.g., handling file/memory instances for 'sequence'. There's fromfile for the former, and users needing the latter functionality should be clued up enough to instantiate NumArray directly.
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help.
That's fair. The docstring for NumArray needs beefing up along the same lines as Rory's work on array().
and, I would suggest, the documentation.
I initially liked Florian's idea of frombuffer() but since I can't think of how it's not identical to NumArray(), I'm not sure there's any point.
Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type?
Some people like type. I don't want to touch this.
The basic suggestion was to drop one or the other, since one is an _nt entry and either an instance of a function while the other is a string. I recognize that "type" has become accepted in the numarray community but the same word is used by Python for a utility function.
It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances?
The patch has a new docstring which spells out the array() construction algorithm. Lists of arrays would be seen as "numerical sequences".
Is the function asarray redundant?
Yes, but it's clear and also needed for backward compatibility with Numeric. Besides, it's not just redundant, it's an idiom...
*asarray*( seq, type=None, typecode=None) This function converts scalars, lists and tuples to a numarray, when possible. It passes numarrays through, making copies only to convert types. In any other case a TypeError is raised. *astype*( type) The astype method returns a copy of the array converted to the specified type. As with any copy, the new array is aligned, contiguous, and in native machine byte order. If the specified type is the same as current type, a copy is /still/ made. *array*( sequence=None, typecode=None, copy=1, savespace=0, type=None, shape=None) It seems that the function array could be used in place of either the function asarray or the method astype: >>> import numarray.numerictypes as _nt >>> import numarray.numarraycore as _n >>> a= _n.array([1, 2]) >>> a array([1, 2]) >>> a._type Int32 >>> b= a.astype(_nt.Float64) >>> b._type Float64 >>> a._type Int32 >>> c= _n.array(seq= a, type= _nt.Float64) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: array() got an unexpected keyword argument 'seq' >>> c= _n.array(a, type= _nt.Float64) >>> c._type Float64 >>>
I suggest that the copy parameter be of the BoolType. This probably has no practical impact but it is consistent with current Python usage and makes it clear that this is a Yes/No parameter, rather than specifying a number of copies.
Fair enough. Backward compatibility dictates not *requiring* a bool, but using it as a default is fine.
Colin W.
Colin J. Williams wrote:
Todd Miller wrote:
Someone (way to go Rory!) recently posted a patch (woohoo!) for numarray which I think bears a little discussion since it involves the re-write of a fundamental numarray function: array(). The patch fixes a number of bugs and deconvolutes the logic of array().
The patch is here if you want to look at it yourself:
http://sourceforge.net/tracker/?atid=450449&group_id=1369&func=browse
One item I thought needed some discussion was the removal of two features:
* array() does too much. E.g., handling file/memory instances for 'sequence'. There's fromfile for the former, and users needing the latter functionality should be clued up enough to instantiate NumArray directly.
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
This isn't going to cause me pain, FWIW.
I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help. Currently, neither the word "class" or "NumArray" appear in the doc index.
Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type? As a compromise, either could be accepted as a NumArray.__init__ argument, since it is easy to distinguish between them.
I thought typecode was eventually going away, not type. Either way, it makes sense to drop one of them eventually. This should definately go through a period of deprecation thought: it will certainly require that I fix a bunch of my code.
It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances?
Isn't any sequence that is composed of numbers or subsequences acceptable, as long as it has a consistent shape (no ragged edges)?
Is the function asarray redundant?
No, the copy=False parameter is redundant ;) Well as a pair they are redundant, but if I was going to get rid of something, I'd get rid of copy, because it's lying: copy=False sometimes copies (when the sequence is not an array) and sometimes does not (when the sequence is an array). A better name would be alwaysCopy, but better still would be to just get rid of it altogether and rely on asarray. (asarray may be implemented using the copy parameter now, but that would be easy to fix.). While we're at it, savespace should get nuked too (all with appropriate deprecations I suppose), so the final signature of array would be: array(sequence=None, type=None, shape=None) Hmm. That's still too complicated. It really should be array(sequence, type=None) I believe that other uses can be more clearly accomplished using zeros and reshape. Of course that has drastic backward compatibility issues and even with generous usage of deprecations might not help the transition much. Still, that's probably what I'd shoot for if it were an option.
I suggest that the copy parameter be of the BoolType. This probably has no practical impact but it is consistent with current Python usage and makes it clear that this is a Yes/No parameter, rather than specifying a number of copies.
I think strings.py and records.py also have "over-stuffed" array() functions... so consistency bids us to streamline those as well. Regards, Todd
Thanks to Rory for initiating this.
Agreed. -tim
On Jan 13, 2005, at 12:00 PM, Tim Hochberg wrote:
Colin J. Williams wrote:
Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type? As a compromise, either could be accepted as a NumArray.__init__ argument, since it is easy to distinguish between them.
I thought typecode was eventually going away, not type. Either way, it makes sense to drop one of them eventually. This should definately go through a period of deprecation thought: it will certainly require that I fix a bunch of my code.
Tim is right about this. The rationale was that typecode is inaccurate since types are no longer represented by letter codes (one can still use them for backward compatibility).
On Thu, 2005-01-13 at 09:00, Tim Hochberg wrote:
It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances?
Isn't any sequence that is composed of numbers or subsequences acceptable, as long as it has a consistent shape (no ragged edges)?
Why not make it a little more general and accept iterable objects?
array( typecode[, initializer]) Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, string, or iterable over elements of the appropriate type. Changed in version 2.4: Formerly, only lists or strings were accepted. If given a list or string, the initializer is passed to the new array's fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method. Ralf
Ralf Juengling wrote:
On Thu, 2005-01-13 at 09:00, Tim Hochberg wrote:
It would be good to clarify the acceptable content of a sequence. A list, perhaps with sublists, of numbers is clear enough but what about a sequence of NumArray instances or even a sequence of numbers, mixed with NumArray instances?
Isn't any sequence that is composed of numbers or subsequences acceptable, as long as it has a consistent shape (no ragged edges)?
Why not make it a little more general and accept iterable objects?
array( typecode[, initializer]) Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, string, or iterable over elements of the appropriate type. Changed in version 2.4: Formerly, only lists or strings were accepted. If given a list or string, the initializer is passed to the new array's fromlist(), fromstring(), or fromunicode() method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the extend() method.
Ralf
Yes, I'm not sure whether list comprehension produces an iter object but this should also be included. Similarly instances of subclasses of NumArray should be explicitly included. I like the term no "ragged edges". Colin W.
"Colin J. Williams" <cjw@sympatico.ca> writes:
Yes, I'm not sure whether list comprehension produces an iter object but this should also be included.
Lists are iterable but they also have a length, which is not accessible through the iterator: from a general iterator there is no way of knowing in advance how many items it will return. This may be a problem if you want to allocate memory for the values. -- Timo Korvola <URL:http://www.iki.fi/tkorvola>
[Todd]
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
There seems to be a majority opinion in favour of deprecation, though at least Florian uses the sequence-as-a-buffer feature. [Colin]
I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help. Currently,
The deprecation warning could include a pointer to NumArray or fromfile, as appropriate. I think some of the Python stdlib deprecations (doctest?) do exactly this. The NumArray docs do need to be fixed, though. [Colin]
Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type? As a compromise, either could be accepted as a NumArray.__init__ argument, since it is easy to distinguish between them.
[Perry]
Tim is right about this. The rationale was that typecode is inaccurate since types are no longer represented by letter codes (one can still use them for backward compatibility).
Also, the type keyword matches the NumArray type method. It does have the downside of clashing with the type builtin, of course.
It would be good to clarify the acceptable content of a sequence. A
I think this is quite important, though perhaps not too difficult. I think any sequence, or nested sequences should be accepted, provided that they are "conformally sized" (for lack of a better phrase) and that the innermost sequences contain number types. I'll try to word this more precisely for the docs. Note that a NumArray is a sequence, in the sense that it has __getitem__ and __len__ methods, and is index from 0 upwards. Strings are also sequences, and Alexander made a comment to the patch that array() should handle sequences of strings. Consider Numeric's behaviour:
array(["abc",[1,2,3]]) array([[97, 98, 99], [ 1, 2, 3]])
I think this needs to be handled in fromlist, which, I think, handles fairly general sequences, but not strings. Note that this leads to a different interpretation of array(["abcd"]) and array("abcd") According to the above, array(["abcd"] should return array([[97,98,99,100]]) and, since plain strings go straight to fromstring, array("abcd") should return array([1684234849]) (probably dependent on endianess, what Long is, etc.). Is this acceptable? [Colin]
Is the function asarray redundant?
[Tim]
No, the copy=False parameter is redundant ;) Well as a pair they are
I'm not sure I follow Tim's argument, but asarray is not redundant for a different reason: it returns any NDArray arguments without calling array. generic.ravel calls numarraycore.asarray, and so ravel()ing RecArrays, or some other non-NumArray NDArray requires asarray to remain as it is. I'm not sure if this setup is desirable, but I decided not to change too many things at once. [Colin]
I suggest that the copy parameter be of the BoolType. This probably has no practical impact but it is consistent with current Python usage and makes it clear that this is a Yes/No parameter, rather than specifying a number of copies.
This makes sense; as Todd noted, we shouldn't rely on it being a bool, but having False as the default value is clearer. Cheers, Rory
On Thu, 2005-01-13 at 22:18 +0200, Rory Yorke wrote:
[Todd]
I agree with this myself. Does anyone care if they will no longer be able to construct an array from a file or buffer object using array() rather than fromfile() or NumArray(), respectively? Is a deprecation process necessary to remove them?
There seems to be a majority opinion in favour of deprecation, though at least Florian uses the sequence-as-a-buffer feature.
By way of status, I applied and committed both Rory's patches this morning. Afterward, I added the deprecation warnings for the frombuffer() and fromfile() cases. frombuffer() is identical to NumArray(), so I did not add a new function.
[Colin]
I would suggest deprecation on the way to removal. For the newcomer, who is not yet "clued up" some advice on the instantiation of NumArray would help. Currently,
The deprecation warning could include a pointer to NumArray or fromfile, as appropriate. I think some of the Python stdlib deprecations (doctest?) do exactly this. The NumArray docs do need to be fixed, though.
I didn't touch the docs.
[Colin]
Rory leaves in type and typecode. It would be good to eliminate this apparent overlap. Why not deprecate and then drop type? As a compromise, either could be accepted as a NumArray.__init__ argument, since it is easy to distinguish between them.
[Perry]
Tim is right about this. The rationale was that typecode is inaccurate since types are no longer represented by letter codes (one can still use them for backward compatibility).
Also, the type keyword matches the NumArray type method. It does have the downside of clashing with the type builtin, of course.
IMHO, all this discussion about type/typecode is moot because typecode was added after the fact for Numeric compatibility. It's really makes no sense to take it out now that we're going for interoperability with scipy. I don't like it much either, but the alternative, being incompatible, is worse. "typecode" could be factored out in to the numerix layer, but that just makes life confusing; it's best that numarray works the same whether it's being used with scipy or not.
It would be good to clarify the acceptable content of a sequence. A
I think this is quite important, though perhaps not too difficult. I think any sequence, or nested sequences should be accepted, provided that they are "conformally sized" (for lack of a better phrase) and that the innermost sequences contain number types. I'll try to word this more precisely for the docs.
Note that a NumArray is a sequence, in the sense that it has __getitem__ and __len__ methods, and is index from 0 upwards.
Strings are also sequences, and Alexander made a comment to the patch that array() should handle sequences of strings. Consider Numeric's behaviour:
array(["abc",[1,2,3]]) array([[97, 98, 99], [ 1, 2, 3]])
-1 from me. I think we're getting back into "array does too much" territory.
I think this needs to be handled in fromlist, which, I think, handles fairly general sequences, but not strings.
I think you're right, that's how it could be done.
Note that this leads to a different interpretation of array(["abcd"]) and array("abcd")
According to the above, array(["abcd"] should return array([[97,98,99,100]]) and, since plain strings go straight to fromstring, array("abcd") should return array([1684234849]) (probably dependent on endianess, what Long is, etc.). Is this acceptable?
I held off consolidating all the new default types to Long. Not having defaults hasn't been a problem up to now so I'm not sure Numeric compatibility is such a concern or that Long is really the best default... although it does make it easier to write doctests. Todd
participants (9)
-
Colin J. Williams
-
Florian Schulze
-
Francesc Altet
-
Perry Greenfield
-
Ralf Juengling
-
Rory Yorke
-
Tim Hochberg
-
Timo Korvola
-
Todd Miller