numpy scalars and savez -- bug?
Folks, I've discovered somethign intertesting (bug?) with numpy scalars ans savz. If I save a numpy scalar, then reload it, ot comes back as rank-0 array -- similar, but not the same thing: In [144]: single_value, type(single_value) Out[144]: (2.0, numpy.float32) In [145]: np.savez('test.npz', single_value=single_value) In [146]: single_value2 = np.load('test.npz')['single_value'] In [147]: single_value, type(single_value) Out[147]: (2.0, numpy.float32) In [148]: single_value2, type(single_value2) Out[148]: (array(2.0, dtype=float32), numpy.ndarray) straight np.save has the same issue (which makes sense, I'm sure savez uses the save code under the hood): In [149]: single_value, type(single_value) Out[149]: (2.0, numpy.float32) In [150]: np.save('test.npy', single_value) In [151]: single_value2 = np.load('test.npy') In [152]: single_value2, type(single_value2) Out[152]: (array(2.0, dtype=float32), numpy.ndarray) This has been annoying, particular as rank-zero scalars are kind of a pain. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Apr 18, 2013 at 5:58 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Folks,
I've discovered somethign intertesting (bug?) with numpy scalars ans savz. If I save a numpy scalar, then reload it, ot comes back as rank-0 array -- similar, but not the same thing:
In [144]: single_value, type(single_value) Out[144]: (2.0, numpy.float32)
In [145]: np.savez('test.npz', single_value=single_value)
In [146]: single_value2 = np.load('test.npz')['single_value']
In [147]: single_value, type(single_value) Out[147]: (2.0, numpy.float32)
In [148]: single_value2, type(single_value2) Out[148]: (array(2.0, dtype=float32), numpy.ndarray)
straight np.save has the same issue (which makes sense, I'm sure savez uses the save code under the hood):
In [149]: single_value, type(single_value) Out[149]: (2.0, numpy.float32)
In [150]: np.save('test.npy', single_value)
In [151]: single_value2 = np.load('test.npy')
In [152]: single_value2, type(single_value2) Out[152]: (array(2.0, dtype=float32), numpy.ndarray)
This has been annoying, particular as rank-zero scalars are kind of a pain.
np.save() and company (and the NPY format itself) are for arrays, not for scalars. np.save() uses an np.asanyarray() to coerce its input which is why your scalar gets converted to a rank-zero array. -- Robert Kern
On Thu, Apr 18, 2013 at 4:04 AM, Robert Kern <robert.kern@gmail.com> wrote:
np.save() and company (and the NPY format itself) are for arrays, not for scalars. np.save() uses an np.asanyarray() to coerce its input which is why your scalar gets converted to a rank-zero array.
Fair enough -- so a missing feature, not bug -- I'll need to look at the docs and see if that can be clarified - I note that it never dawned on me to pass anything other than an array in (like a list), but I guess if I did, it would likely work, but return an array when re-loaded. I'm ambivalent about whether I like this feature -- in this case, it resulted in confusion. If I'd gotten an exception in the first place, it would have been simple enough to fix, as it was it took some poking around. As for numpy scalars -- would it be a major lift to support them directly? -Chris
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Apr 18, 2013 at 8:31 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Fair enough -- so a missing feature, not bug -- I'll need to look at the docs and see if that can be clarified -
All I've found is the docstring docs (which also show up in the Sphinx docs). I suggest some slight modification: def save(file, arr): """ Save an array to a binary file in NumPy ``.npy`` format. Parameters ---------- file : file or str File or filename to which the data is saved. If file is a file-object, then the filename is unchanged. If file is a string, a ``.npy`` extension will be appended to the file name if it does not already have one. arr : array_like Array data to be saved. Any object that is not an array will be converted to an array with asanyarray(). When reloaded, the array version of the object will be returned. See Also -------- savez : Save several arrays into a ``.npz`` archive savetxt, load Notes ----- For a description of the ``.npy`` format, see `format`. Examples -------- >>> from tempfile import TemporaryFile >>> outfile = TemporaryFile() >>> x = np.arange(10) >>> np.save(outfile, x) >>> outfile.seek(0) # Only needed here to simulate closing & reopening file >>> np.load(outfile) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) """ I also see: For a description of the ``.npy`` format, see `format`. but no idea where to find 'format' -- it looks like it should be a link in the Sphinx docs, but it's not. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Apr 18, 2013 at 9:20 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
On Thu, Apr 18, 2013 at 8:31 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Fair enough -- so a missing feature, not bug -- I'll need to look at the docs and see if that can be clarified -
All I've found is the docstring docs (which also show up in the Sphinx docs). I suggest some slight modification:
def save(file, arr): """ Save an array to a binary file in NumPy ``.npy`` format.
Parameters ---------- file : file or str File or filename to which the data is saved. If file is a file-object, then the filename is unchanged. If file is a string, a ``.npy`` extension will be appended to the file name if it does not already have one. arr : array_like Array data to be saved. Any object that is not an array will be converted to an array with asanyarray(). When reloaded, the array version of the object will be returned.
See Also -------- savez : Save several arrays into a ``.npz`` archive savetxt, load
Notes ----- For a description of the ``.npy`` format, see `format`.
Examples -------- >>> from tempfile import TemporaryFile >>> outfile = TemporaryFile()
>>> x = np.arange(10) >>> np.save(outfile, x)
>>> outfile.seek(0) # Only needed here to simulate closing & reopening file >>> np.load(outfile) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
"""
I also see:
For a description of the ``.npy`` format, see `format`.
but no idea where to find 'format' -- it looks like it should be a link in the Sphinx docs, but it's not.
It does seem to be missing from the docs. https://github.com/numpy/numpy/blob/master/numpy/lib/format.py -- Robert Kern
This has been annoying, particular as rank-zero scalars are kind of a
On 18 Apr 2013 01:29, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote: pain. BTW, while we're on the topic, can you elaborate on this? I tend to think scalars (as opposed to 0d ndarrays) are kind of a pain, so I'm curious if you have specific issues you've run into with 0d ndarrays. -n
This has been annoying, particular as rank-zero scalars are kind of a
On Apr 18, 2013, at 11:33 PM, Nathaniel Smith <njs@pobox.com> wrote: On 18 Apr 2013 01:29, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote: pain. BTW, while we're on the topic, can you elaborate on this? I tend to think scalars (as opposed to 0d ndarrays) are kind of a pain, so I'm curious if you have specific issues you've run into with 0d ndarrays. Well, I suppose what's really a pain is that we have both, and they are not the same, and neither can be used in all cases one may want. In the case at hand, I really wanted a datetime64 scalar. By saving and re-loading in an npz, it got converted to a rank-zero array, which had different behavior. In this case, the frustrating bit was how to extract a scalar again ( which I really wanted to turn into a datetime object). After the fact, I discovered .item(), which seems to do what I want. On a phone now, so sorry about the lack of examples. Note: I've lost track of why we need both scalers and rank-zero arrays. I can't help thinking that there could be an object that acts like a scalar in most contexts, but also has the array methods that make sense. But I know it's far from simple. -Chris
Robert, As I think you wrote the code, you may have a quick answer: Given that numpy scalars do exist, and have their uses -- I found this wiki page to remind me: http://projects.scipy.org/numpy/wiki/ZeroRankArray It would be nice if the .npy format could support them. Would that be a major change? I'm trying to decide if this bugs me enough to work on that. -Chris On Fri, Apr 19, 2013 at 8:03 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
On Apr 18, 2013, at 11:33 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 18 Apr 2013 01:29, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
This has been annoying, particular as rank-zero scalars are kind of a pain.
BTW, while we're on the topic, can you elaborate on this? I tend to think scalars (as opposed to 0d ndarrays) are kind of a pain, so I'm curious if you have specific issues you've run into with 0d ndarrays.
Well, I suppose what's really a pain is that we have both, and they are not the same, and neither can be used in all cases one may want.
In the case at hand, I really wanted a datetime64 scalar. By saving and re-loading in an npz, it got converted to a rank-zero array, which had different behavior. In this case, the frustrating bit was how to extract a scalar again ( which I really wanted to turn into a datetime object).
After the fact, I discovered .item(), which seems to do what I want.
On a phone now, so sorry about the lack of examples.
Note: I've lost track of why we need both scalers and rank-zero arrays. I can't help thinking that there could be an object that acts like a scalar in most contexts, but also has the array methods that make sense.
But I know it's far from simple.
-Chris
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, 2013-04-19 at 08:03 -0700, Chris Barker - NOAA Federal wrote:
On Apr 18, 2013, at 11:33 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 18 Apr 2013 01:29, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
This has been annoying, particular as rank-zero scalars are kind of a pain.
BTW, while we're on the topic, can you elaborate on this? I tend to think scalars (as opposed to 0d ndarrays) are kind of a pain, so I'm curious if you have specific issues you've run into with 0d ndarrays.
Well, I suppose what's really a pain is that we have both, and they are not the same, and neither can be used in all cases one may want.
In the case at hand, I really wanted a datetime64 scalar. By saving and re-loading in an npz, it got converted to a rank-zero array, which had different behavior. In this case, the frustrating bit was how to extract a scalar again ( which I really wanted to turn into a datetime object).
After the fact, I discovered .item(), which seems to do what I want.
Fun fact, array[()] will convert a 0-d array to a scalar, but do nothing (or currently create a view) for other arrays. Which is actually a good question. Should array[()] force a view or not? - Sebastian
On a phone now, so sorry about the lack of examples.
Note: I've lost track of why we need both scalers and rank-zero arrays. I can't help thinking that there could be an object that acts like a scalar in most contexts, but also has the array methods that make sense.
But I know it's far from simple.
-Chris
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Apr 19, 2013 at 8:45 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Robert,
As I think you wrote the code, you may have a quick answer:
Given that numpy scalars do exist, and have their uses -- I found this wiki page to remind me:
http://projects.scipy.org/numpy/wiki/ZeroRankArray
It would be nice if the .npy format could support them. Would that be a major change? I'm trying to decide if this bugs me enough to work on that.
I think that is significant scope creep for the .npy format, and I would like to avoid it. A case might be made for letting np.savez() simply pickle non-arrays that it is given. I have a vague recollection that that was discussed when savez() was designed and rejected as a moral hazard, but I could be wrong. The .npy and .npz formats are intentionally limited by design. As soon as you feel constrained by those limitations, you should start using more full-fledged and standard file formats. -- Robert Kern
On Fri, Apr 19, 2013 at 9:40 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Fun fact, array[()] will convert a 0-d array to a scalar, but do nothing (or currently create a view) for other arrays. Which is actually a good question. Should array[()] force a view or not?
Another fun fact: scalar[()] gives you a rank-0 array. :-) I think the array[()] behavior follows logically as a limiting form of multidimensional indexing. Given rank-3 array, array[(i, j, k)] gives a scalar array[(i, j)] gives a rank-1 view of the last axis array[(i,)] gives a rank-2 view of the last 2 axes array[()] gives a rank-3 view of the last 3 axes (i.e. all of them) The rank-N-general rules look like so: For a rank-N array, an N-tuple gives a scalar. Subsequent (N-M)-tuples gives appropriate rank-M views picked out by the tuple. I can't explain the scalar[()] behavior, though. :-) -- Robert Kern
On Fri, 2013-04-19 at 23:02 +0530, Robert Kern wrote:
On Fri, Apr 19, 2013 at 9:40 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Fun fact, array[()] will convert a 0-d array to a scalar, but do nothing (or currently create a view) for other arrays. Which is actually a good question. Should array[()] force a view or not?
Another fun fact: scalar[()] gives you a rank-0 array. :-)
Hahahaha, thats pretty (I would say bug).
I think the array[()] behavior follows logically as a limiting form of multidimensional indexing. Given rank-3 array,
array[(i, j, k)] gives a scalar array[(i, j)] gives a rank-1 view of the last axis array[(i,)] gives a rank-2 view of the last 2 axes array[()] gives a rank-3 view of the last 3 axes (i.e. all of them)
The rank-N-general rules look like so: For a rank-N array, an N-tuple gives a scalar. Subsequent (N-M)-tuples gives appropriate rank-M views picked out by the tuple.
I can't explain the scalar[()] behavior, though. :-)
Another special case... It doesn't hit the normal indexing code, and so it can never hit the tuple of integers special case to get converted back to a scalar. But it also always gets converted to an array first. So many special cases that are hard to miss, there is a reason I started rewriting it ;). Still not sure if one should force views sometimes though, but I doubt it matters... - Sebastian
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Apr 19, 2013 at 10:21 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, Apr 19, 2013 at 8:45 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Given that numpy scalars do exist, and have their uses -- I found this wiki page to remind me:
http://projects.scipy.org/numpy/wiki/ZeroRankArray
It would be nice if the .npy format could support them. Would that be a major change? I'm trying to decide if this bugs me enough to work on that.
I think that is significant scope creep for the .npy format, and I would like to avoid it.
hmm -- maybe it's more work that we want, but it seems to me that numpy scalars are part and parcel of numpy -- so it makes sense for .npy to save them.
A case might be made for letting np.savez() simply pickle non-arrays that it is given. I have a vague recollection that that was discussed when savez() was designed and rejected as a moral hazard, but I could be wrong.
That could be a nice solution -- I'm not _so_ worried about moral hazards!
The .npy and .npz formats are intentionally limited by design. As soon as you feel constrained by those limitations, you should start using more full-fledged and standard file formats.
well, maybe -- in this case, I'm using it to cache a bunch of data on disk. The data are all in a dict of numpy arrays, so it was really natural and easy (and I presume fast) to use npz. All I want to is dump it to disk, and get back the same way. It worked great. Then I needed a datetime stored with it all -- so I figured a datetime64 scalar would be perfect. It's not a huge deal to use a rank-zero array instead, but it would have been nicer to be able to store a scalar (I suppose one trick may be that there are numpy scalars, and there are regular old pyton scalars...) Anyway -- going to HDF, or netcdf, or role-your-own really seems like overkill for this. I just need something fast and simple and it doesn't need to interchange with anything else. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 19 Apr 2013 19:22, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
Anyway -- going to HDF, or netcdf, or role-your-own really seems like overkill for this. I just need something fast and simple and it doesn't need to interchange with anything else.
Just use pickle...? -n
On Fri, Apr 19, 2013 at 11:31 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 19 Apr 2013 19:22, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
Anyway -- going to HDF, or netcdf, or role-your-own really seems like overkill for this. I just need something fast and simple and it doesn't need to interchange with anything else.
Just use pickle...?
hmm -- for some reason, I always have thought as pickle as unreliable and ill-suited to numpy arrays -- we developed savez for a reason... but maybe I just need to give it a shot and see how it works. Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
19.04.2013 22:06, Chris Barker - NOAA Federal kirjoitti:
On Fri, Apr 19, 2013 at 11:31 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 19 Apr 2013 19:22, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
Anyway -- going to HDF, or netcdf, or role-your-own really seems like overkill for this. I just need something fast and simple and it doesn't need to interchange with anything else.
Just use pickle...?
hmm -- for some reason, I always have thought as pickle as unreliable and ill-suited to numpy arrays -- we developed savez for a reason... but maybe I just need to give it a shot and see how it works.
protocol=2 so it doesn't needlessly ascii-quote the data. -- Pauli Virtanen
On Sat, Apr 20, 2013 at 12:36 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
On Fri, Apr 19, 2013 at 11:31 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 19 Apr 2013 19:22, "Chris Barker - NOAA Federal" <chris.barker@noaa.gov> wrote:
Anyway -- going to HDF, or netcdf, or role-your-own really seems like overkill for this. I just need something fast and simple and it doesn't need to interchange with anything else.
Just use pickle...?
hmm -- for some reason, I always have thought as pickle as unreliable and ill-suited to numpy arrays -- we developed savez for a reason... but maybe I just need to give it a shot and see how it works.
The rationale behind .npy format are laid out here: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt -- Robert Kern
participants (5)
-
Chris Barker - NOAA Federal -
Nathaniel Smith -
Pauli Virtanen -
Robert Kern -
Sebastian Berg