New numpy functions: filled, filled_like
Hi all, PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1) The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in? (Bonus, extra bike-sheddy survey: do people prefer np.filled((10, 10), np.nan) np.filled_like(my_arr, np.nan) or np.filled(np.nan, (10, 10)) np.filled_like(np.nan, my_arr) ?) -n
On Sun, Jan 13, 2013 at 12:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
+1 I find it useful. I do the indirect way very often, or write matlab style helper functions. def nanes: .... problem dtype: inf and nan only makes sense for float I don't think I used many besides those two.
(Bonus, extra bike-sheddy survey: do people prefer np.filled((10, 10), np.nan) np.filled_like(my_arr, np.nan)
+ 0.5
or np.filled(np.nan, (10, 10)) np.filled_like(np.nan, my_arr) ?)
Josef
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, On Sun, Jan 13, 2013 at 7:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
(Bonus, extra bike-sheddy survey: do people prefer np.filled((10, 10), np.nan) np.filled_like(my_arr, np.nan)
+0 OTOH, it might also be handy to let val to be an array as well, which is then repeated to fill the array. My 2 cents. -eat
or np.filled(np.nan, (10, 10)) np.filled_like(np.nan, my_arr) ?)
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, On Sun, Jan 13, 2013 at 5:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
(Bonus, extra bike-sheddy survey: do people prefer np.filled((10, 10), np.nan) np.filled_like(my_arr, np.nan) or np.filled(np.nan, (10, 10)) np.filled_like(np.nan, my_arr) ?)
I remember there has been a reluctance in the past to add functions that were two-liners. I guess the problem might be that the namespace fills up with many similar things. Is this a worry? Best, Matthew
On 2013/01/13 7:27 AM, Nathaniel Smith wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
I'm neutral to negative as to whether it is worth adding these to the namespace; I don't mind using the "cumbersome" alternative. Note also that there is already a numpy.ma.filled() function for quite a different purpose, so putting a filled() in numpy breaks the pattern that ma has masked versions of most numpy functions. This consideration actually tips me quite a bit toward the negative side. I don't think I am unique in relying heavily on masked arrays.
(Bonus, extra bike-sheddy survey: do people prefer np.filled((10, 10), np.nan) np.filled_like(my_arr, np.nan)
+1 for this form if you decide to do it despite the problem mentioned above.
or np.filled(np.nan, (10, 10)) np.filled_like(np.nan, my_arr)
This one is particularly bad for filled_like, therefore bad for both. Eric
?)
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self: a = np.empty(...).fill(20.0) -- Robert Kern
On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
Nice. Matthew
On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.) Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this... (The nice thing about np.filled() is that it makes np.zeros() and np.ones() feel like clutter, rather than the reverse... not that I'm suggesting ever getting rid of them, but it makes the API conceptually feel smaller, not larger.) -n
On Sun, Jan 13, 2013 at 6:39 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.)
Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this...
I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds. Skipper [1] https://en.wikipedia.org/wiki/Fluent_interface [2] https://github.com/pydata/pandas/issues/1893
On Sun, Jan 13, 2013 at 11:48 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:39 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.)
Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this...
I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds.
Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ;-)): http://mail.python.org/pipermail/python-dev/2003-October/038855.html -n
On Mon, Jan 14, 2013 at 1:04 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Jan 13, 2013 at 11:48 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:39 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.)
Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this...
I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds.
Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ;-)): http://mail.python.org/pipermail/python-dev/2003-October/038855.html
That's a statement about the policy for the stdlib, and just one person's opinion. You, and numpy, are permitted to have a different opinion. In any case, I'm not strongly advocating for it. It's violation of principle ("no fluent interfaces") is roughly in the same ballpark as np.filled() ("not every two-liner needs its own function"), so I thought I would toss it out there for consideration. -- Robert Kern
Robert Kern <robert.kern <at> gmail.com> writes:
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.)
Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this...
I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds.
Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ): http://mail.python.org/pipermail/python-dev/2003-October/038855.html
That's a statement about the policy for the stdlib, and just one person's opinion. You, and numpy, are permitted to have a different opinion.
In any case, I'm not strongly advocating for it. It's violation of principle ("no fluent interfaces") is roughly in the same ballpark as np.filled() ("not every two-liner needs its own function"), so I thought I would toss it out there for consideration.
-- Robert Kern
FWIW I'm +1 on the idea. Perhaps because I just don't see many practical downsides to breaking the convention but I regularly see a big issue with there being no way to instantiate an array with a particular value. The one obvious way to do it is use ones and multiply by the value you want. I work with a lot of inexperienced programmers and I see this idiom all the time. It takes a fair amount of numpy knowledge to know that you should do it in two lines by using empty and setting a slice. In [1]: %timeit NaN*ones(10000) 1000 loops, best of 3: 1.74 ms per loop In [2]: %%timeit ...: x = empty(10000, dtype=float) ...: x[:] = NaN ...: 10000 loops, best of 3: 28 us per loop In [3]: 1.74e-3/28e-6 Out[3]: 62.142857142857146 Even when not in the mythical "tight loop" setting an array to one and then multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower than what we know they *should* be doing. I'm agnostic as to whether fill should be modified or new functions provided but I think numpy is currently missing this functionality and that providing it would save a lot of new users from shooting themselves in the foot performance- wise. -Dave
Hi, On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld <dave.hirschfeld@gmail.com> wrote:
Robert Kern <robert.kern <at> gmail.com> writes:
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.)
Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this...
I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds.
Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ): http://mail.python.org/pipermail/python-dev/2003-October/038855.html
That's a statement about the policy for the stdlib, and just one person's opinion. You, and numpy, are permitted to have a different opinion.
In any case, I'm not strongly advocating for it. It's violation of principle ("no fluent interfaces") is roughly in the same ballpark as np.filled() ("not every two-liner needs its own function"), so I thought I would toss it out there for consideration.
-- Robert Kern
FWIW I'm +1 on the idea. Perhaps because I just don't see many practical downsides to breaking the convention but I regularly see a big issue with there being no way to instantiate an array with a particular value.
The one obvious way to do it is use ones and multiply by the value you want. I work with a lot of inexperienced programmers and I see this idiom all the time. It takes a fair amount of numpy knowledge to know that you should do it in two lines by using empty and setting a slice.
In [1]: %timeit NaN*ones(10000) 1000 loops, best of 3: 1.74 ms per loop
In [2]: %%timeit ...: x = empty(10000, dtype=float) ...: x[:] = NaN ...: 10000 loops, best of 3: 28 us per loop
In [3]: 1.74e-3/28e-6 Out[3]: 62.142857142857146
Even when not in the mythical "tight loop" setting an array to one and then multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower than what we know they *should* be doing.
I'm agnostic as to whether fill should be modified or new functions provided but I think numpy is currently missing this functionality and that providing it would save a lot of new users from shooting themselves in the foot performance- wise.
Is this a fair summary? => fill(shape, val), fill_like(arr, val) - new functions, as proposed For: readable, seems to fit a pattern often used, presence in namespace may clue people into using the 'fill' rather than * val or + val Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe cluttering already full namespace. => empty(shape).fill(val) - by allowing return value from arr.fill(val) For: readable Con: breaks guideline not to return anything from in-place operations, no presence in namespace means users may not find this pattern. => no new API For : easy maintenance Con : harder for users to discover fill pattern, filling a new array requires two lines instead of one. So maybe the decision rests on: How important is it that users see these function names in the namespace in order to discover the pattern "a = ones(shape) ; a.fill(val)"? How important is it to obey guidelines for no-return-from-in-place? How important is it to avoid expanding the namespace? How common is this pattern? On the last, I'd say that the only common use I have for this pattern is to fill an array with NaN. Cheers, Matthew
2013/1/14 Matthew Brett <matthew.brett@gmail.com>:
Hi,
On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld <dave.hirschfeld@gmail.com> wrote:
Robert Kern <robert.kern <at> gmail.com> writes:
> > One alternative that does not expand the API with two-liners is to let > the ndarray.fill() method return self: > > a = np.empty(...).fill(20.0)
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.)
Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this...
I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds.
Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ): http://mail.python.org/pipermail/python-dev/2003-October/038855.html
That's a statement about the policy for the stdlib, and just one person's opinion. You, and numpy, are permitted to have a different opinion.
In any case, I'm not strongly advocating for it. It's violation of principle ("no fluent interfaces") is roughly in the same ballpark as np.filled() ("not every two-liner needs its own function"), so I thought I would toss it out there for consideration.
-- Robert Kern
FWIW I'm +1 on the idea. Perhaps because I just don't see many practical downsides to breaking the convention but I regularly see a big issue with there being no way to instantiate an array with a particular value.
The one obvious way to do it is use ones and multiply by the value you want. I work with a lot of inexperienced programmers and I see this idiom all the time. It takes a fair amount of numpy knowledge to know that you should do it in two lines by using empty and setting a slice.
In [1]: %timeit NaN*ones(10000) 1000 loops, best of 3: 1.74 ms per loop
In [2]: %%timeit ...: x = empty(10000, dtype=float) ...: x[:] = NaN ...: 10000 loops, best of 3: 28 us per loop
In [3]: 1.74e-3/28e-6 Out[3]: 62.142857142857146
Even when not in the mythical "tight loop" setting an array to one and then multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower than what we know they *should* be doing.
I'm agnostic as to whether fill should be modified or new functions provided but I think numpy is currently missing this functionality and that providing it would save a lot of new users from shooting themselves in the foot performance- wise.
Is this a fair summary?
=> fill(shape, val), fill_like(arr, val) - new functions, as proposed For: readable, seems to fit a pattern often used, presence in namespace may clue people into using the 'fill' rather than * val or + val Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe cluttering already full namespace.
=> empty(shape).fill(val) - by allowing return value from arr.fill(val) For: readable Con: breaks guideline not to return anything from in-place operations, no presence in namespace means users may not find this pattern.
=> no new API For : easy maintenance Con : harder for users to discover fill pattern, filling a new array requires two lines instead of one.
So maybe the decision rests on:
How important is it that users see these function names in the namespace in order to discover the pattern "a = ones(shape) ; a.fill(val)"?
How important is it to obey guidelines for no-return-from-in-place?
How important is it to avoid expanding the namespace?
How common is this pattern?
On the last, I'd say that the only common use I have for this pattern is to fill an array with NaN.
My 2 cts from a user perspective: - +1 to have such a function. I usually use numpy.ones * scalar because honestly, spending two lines of code for such a basic operations seems like a waste. Even if it's slower and potentially dangerous due to casting rules. - I think having a noun rather than a verb makes more sense since we have numpy.ones and numpy.zeros (and I always read "numpy.empty" as "give me an empty array", not "empty an array"). - I agree the name collision with np.ma.filled is a problem. I have no better suggestion though at this point. -=- Olivier
On Mon, Jan 14, 2013 at 11:15 AM, Olivier Delalleau <shish@keba.be> wrote:
2013/1/14 Matthew Brett <matthew.brett@gmail.com>:
Hi,
On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld <dave.hirschfeld@gmail.com> wrote:
Robert Kern <robert.kern <at> gmail.com> writes:
> > > > One alternative that does not expand the API with two-liners is to let > > the ndarray.fill() method return self: > > > > a = np.empty(...).fill(20.0) > > This violates the convention that in-place operations never return > self, to avoid confusion with out-of-place operations. E.g. > ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus > np.sort(), and in the broader Python world, list.sort() versus > sorted(), list.reverse() versus reversed(). (This was an explicit > reason given for list.sort to not return self, even.) > > Maybe enabling this idiom is a good enough reason to break the > convention ("Special cases aren't special enough to break the rules. / > Although practicality beats purity"), but it at least makes me -0 on > this... >
I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds.
Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ): http://mail.python.org/pipermail/python-dev/2003-October/038855.html
That's a statement about the policy for the stdlib, and just one person's opinion. You, and numpy, are permitted to have a different opinion.
In any case, I'm not strongly advocating for it. It's violation of principle ("no fluent interfaces") is roughly in the same ballpark as np.filled() ("not every two-liner needs its own function"), so I thought I would toss it out there for consideration.
-- Robert Kern
FWIW I'm +1 on the idea. Perhaps because I just don't see many practical downsides to breaking the convention but I regularly see a big issue with there being no way to instantiate an array with a particular value.
The one obvious way to do it is use ones and multiply by the value you want. I work with a lot of inexperienced programmers and I see this idiom all the time. It takes a fair amount of numpy knowledge to know that you should do it in two lines by using empty and setting a slice.
In [1]: %timeit NaN*ones(10000) 1000 loops, best of 3: 1.74 ms per loop
In [2]: %%timeit ...: x = empty(10000, dtype=float) ...: x[:] = NaN ...: 10000 loops, best of 3: 28 us per loop
In [3]: 1.74e-3/28e-6 Out[3]: 62.142857142857146
Even when not in the mythical "tight loop" setting an array to one and then multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower than what we know they *should* be doing.
I'm agnostic as to whether fill should be modified or new functions provided but I think numpy is currently missing this functionality and that providing it would save a lot of new users from shooting themselves in the foot performance- wise.
Is this a fair summary?
=> fill(shape, val), fill_like(arr, val) - new functions, as proposed For: readable, seems to fit a pattern often used, presence in namespace may clue people into using the 'fill' rather than * val or + val Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe cluttering already full namespace.
=> empty(shape).fill(val) - by allowing return value from arr.fill(val) For: readable Con: breaks guideline not to return anything from in-place operations, no presence in namespace means users may not find this pattern.
=> no new API For : easy maintenance Con : harder for users to discover fill pattern, filling a new array requires two lines instead of one.
So maybe the decision rests on:
How important is it that users see these function names in the namespace in order to discover the pattern "a = ones(shape) ; a.fill(val)"?
How important is it to obey guidelines for no-return-from-in-place?
How important is it to avoid expanding the namespace?
How common is this pattern?
On the last, I'd say that the only common use I have for this pattern is to fill an array with NaN.
My 2 cts from a user perspective:
- +1 to have such a function. I usually use numpy.ones * scalar because honestly, spending two lines of code for such a basic operations seems like a waste. Even if it's slower and potentially dangerous due to casting rules. - I think having a noun rather than a verb makes more sense since we have numpy.ones and numpy.zeros (and I always read "numpy.empty" as "give me an empty array", not "empty an array"). - I agree the name collision with np.ma.filled is a problem. I have no better suggestion though at this point.
np.array_filled(shape, value, dtype) ? maybe more verbose, but unambiguous AFAICS BTW GAUSS http://en.wikipedia.org/wiki/GAUSS_(software) also has zeros and ones. 1st release 1984 np.array_filled((100, 2), -999, int) ? Josef
-=- Olivier _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Jan 14, 2013 at 11:22 AM, <josef.pktd@gmail.com> wrote:
On Mon, Jan 14, 2013 at 11:15 AM, Olivier Delalleau <shish@keba.be> wrote:
2013/1/14 Matthew Brett <matthew.brett@gmail.com>:
Hi,
On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld <dave.hirschfeld@gmail.com> wrote:
Robert Kern <robert.kern <at> gmail.com> writes:
>> > >> > One alternative that does not expand the API with two-liners is to let >> > the ndarray.fill() method return self: >> > >> > a = np.empty(...).fill(20.0) >> >> This violates the convention that in-place operations never return >> self, to avoid confusion with out-of-place operations. E.g. >> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >> np.sort(), and in the broader Python world, list.sort() versus >> sorted(), list.reverse() versus reversed(). (This was an explicit >> reason given for list.sort to not return self, even.) >> >> Maybe enabling this idiom is a good enough reason to break the >> convention ("Special cases aren't special enough to break the rules. / >> Although practicality beats purity"), but it at least makes me -0 on >> this... >> > > I tend to agree with the notion that inplace operations shouldn't return > self, but I don't know if it's just because I've been conditioned this way. > Not returning self breaks the fluid interface pattern [1], as noted in a > similar discussion on pandas [2], FWIW, though there's likely some way to > have both worlds.
Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ): http://mail.python.org/pipermail/python-dev/2003-October/038855.html
That's a statement about the policy for the stdlib, and just one person's opinion. You, and numpy, are permitted to have a different opinion.
In any case, I'm not strongly advocating for it. It's violation of principle ("no fluent interfaces") is roughly in the same ballpark as np.filled() ("not every two-liner needs its own function"), so I thought I would toss it out there for consideration.
-- Robert Kern
FWIW I'm +1 on the idea. Perhaps because I just don't see many practical downsides to breaking the convention but I regularly see a big issue with there being no way to instantiate an array with a particular value.
The one obvious way to do it is use ones and multiply by the value you want. I work with a lot of inexperienced programmers and I see this idiom all the time. It takes a fair amount of numpy knowledge to know that you should do it in two lines by using empty and setting a slice.
In [1]: %timeit NaN*ones(10000) 1000 loops, best of 3: 1.74 ms per loop
In [2]: %%timeit ...: x = empty(10000, dtype=float) ...: x[:] = NaN ...: 10000 loops, best of 3: 28 us per loop
In [3]: 1.74e-3/28e-6 Out[3]: 62.142857142857146
Even when not in the mythical "tight loop" setting an array to one and then multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower than what we know they *should* be doing.
I'm agnostic as to whether fill should be modified or new functions provided but I think numpy is currently missing this functionality and that providing it would save a lot of new users from shooting themselves in the foot performance- wise.
Is this a fair summary?
=> fill(shape, val), fill_like(arr, val) - new functions, as proposed For: readable, seems to fit a pattern often used, presence in namespace may clue people into using the 'fill' rather than * val or + val Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe cluttering already full namespace.
=> empty(shape).fill(val) - by allowing return value from arr.fill(val) For: readable Con: breaks guideline not to return anything from in-place operations, no presence in namespace means users may not find this pattern.
=> no new API For : easy maintenance Con : harder for users to discover fill pattern, filling a new array requires two lines instead of one.
So maybe the decision rests on:
How important is it that users see these function names in the namespace in order to discover the pattern "a = ones(shape) ; a.fill(val)"?
How important is it to obey guidelines for no-return-from-in-place?
How important is it to avoid expanding the namespace?
How common is this pattern?
On the last, I'd say that the only common use I have for this pattern is to fill an array with NaN.
My 2 cts from a user perspective:
- +1 to have such a function. I usually use numpy.ones * scalar because honestly, spending two lines of code for such a basic operations seems like a waste. Even if it's slower and potentially dangerous due to casting rules. - I think having a noun rather than a verb makes more sense since we have numpy.ones and numpy.zeros (and I always read "numpy.empty" as "give me an empty array", not "empty an array"). - I agree the name collision with np.ma.filled is a problem. I have no better suggestion though at this point.
np.array_filled(shape, value, dtype) ? maybe more verbose, but unambiguous AFAICS
BTW GAUSS http://en.wikipedia.org/wiki/GAUSS_(software) also has zeros and ones. 1st release 1984
np.array_filled((100, 2), -999, int) ?
A quick check of the statsmodels source 20 occassions of np.nan * np.ones(...) 50 occassions of np.emtpy a few filled with other values than nan many filled in a loop (optimistically, more often used by new contributers) It's just a two-liner, but if it's a function it hopefully produces better code. David's argument looks plausible to me. Josef
Josef
-=- Olivier _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Just changing the subject line so a good suggestion does not get lost ... Alan
On Mon, Jan 14, 2013 at 12:27 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/01/14 6:15 AM, Olivier Delalleau wrote:
- I agree the name collision with np.ma.filled is a problem. I have no better suggestion though at this point.
How about "initialized()"?
A verb! +1 from me! For those wondering, I have a personal rule that because functions *do* something, they really should have verbs for their names. I have to learn to read functions like "ones" and "empty" like "give me ones" or "give me an empty array". Ben Root
Le 14/01/2013 18:33, Benjamin Root a écrit :
How about "initialized()"?
A verb! +1 from me!
Shouldn't it be "initialize()" then ? I'm not so fond of it though, because initialize is pretty broad in the field of programming. What about "refurbishing" the already existing "tile()" function ? As of now it almost does the job : In [8]: tile(nan, (3,3)) # (it's a verb ! ) Out[8]: array([[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]]) though with two restrictions: * tile doesn't have a dtype keyword. Could this be added ? * tile performance on my computer seems to be twice as bad as "ones() * val" Best, Pierre
On Mon, Jan 14, 2013 at 1:12 PM, Pierre Haessig <pierre.haessig@crans.org> wrote:
In [8]: tile(nan, (3,3)) # (it's a verb ! )
tile, in my opinion, is useful in some cases (for people who think in terms of repmat()) but not very NumPy-ish. What I'd like is a function that takes - an initial array_like "a" - a shape "s" - optionally, a dtype (otherwise inherit from a) and broadcasts "a" to the shape "s". In the case of scalars this is just a fill. In the case of, say, a (5,) vector and a (10, 5) shape, this broadcasts across rows, etc. I don't think it's worth special-casing scalar fills (except perhaps as an implementation detail) when you have rich broadcasting semantics that are already a fundamental part of NumPy, allowing for a much handier primitive. David
On Mon, Jan 14, 2013 at 1:56 PM, David Warde-Farley < d.warde.farley@gmail.com> wrote:
On Mon, Jan 14, 2013 at 1:12 PM, Pierre Haessig <pierre.haessig@crans.org> wrote:
In [8]: tile(nan, (3,3)) # (it's a verb ! )
tile, in my opinion, is useful in some cases (for people who think in terms of repmat()) but not very NumPy-ish. What I'd like is a function that takes
- an initial array_like "a" - a shape "s" - optionally, a dtype (otherwise inherit from a)
and broadcasts "a" to the shape "s". In the case of scalars this is just a fill. In the case of, say, a (5,) vector and a (10, 5) shape, this broadcasts across rows, etc.
I don't think it's worth special-casing scalar fills (except perhaps as an implementation detail) when you have rich broadcasting semantics that are already a fundamental part of NumPy, allowing for a much handier primitive.
I have similar problems with "tile". I learned it for a particular use in numpy, and it would be hard for me to see it for another (contextually) different use. I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples. Ben Root
Hi, Le 14/01/2013 20:05, Benjamin Root a écrit :
I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*. I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step. So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases ! best, Pierre
On 2013/01/17 4:13 AM, Pierre Haessig wrote:
Hi,
Le 14/01/2013 20:05, Benjamin Root a écrit :
I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*.
I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step.
So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases !
I agree with your summary and conclusion. Eric
best, Pierre
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/01/17 4:13 AM, Pierre Haessig wrote:
Hi,
Le 14/01/2013 20:05, Benjamin Root a écrit :
I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*.
I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step.
So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases !
I agree with your summary and conclusion.
Eric
Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases. Ben Root P.S. - I know they aren't verbs...
Hi, On Thu, Jan 17, 2013 at 10:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/01/17 4:13 AM, Pierre Haessig wrote:
Hi,
Le 14/01/2013 20:05, Benjamin Root a écrit :
I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*.
I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step.
So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases !
I agree with your summary and conclusion.
Eric
Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases.
I'm a -0.5 on the new functions, just because they only save one line of code, and the use-case is fairly rare in my experience.. Cheers, Matthew
On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/01/17 4:13 AM, Pierre Haessig wrote:
Hi,
Le 14/01/2013 20:05, Benjamin Root a écrit :
I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*.
I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step.
So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases !
I agree with your summary and conclusion.
Eric
Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases.
Ben Root
P.S. - I know they aren't verbs...
Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling? np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan) -Mark
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, On Thu, Jan 17, 2013 at 10:27 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/01/17 4:13 AM, Pierre Haessig wrote:
Hi,
Le 14/01/2013 20:05, Benjamin Root a écrit :
I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*.
I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step.
So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases !
I agree with your summary and conclusion.
Eric
Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases.
Ben Root
P.S. - I know they aren't verbs...
Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling?
np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan)
That sounds like a good idea to me. Someone wanting a fast way to fill an array will probably check out the 'empty' docstring first. See you, Matthew
2013/1/17 Matthew Brett <matthew.brett@gmail.com>:
Hi,
On Thu, Jan 17, 2013 at 10:27 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing <efiring@hawaii.edu> wrote:
On 2013/01/17 4:13 AM, Pierre Haessig wrote:
Hi,
Le 14/01/2013 20:05, Benjamin Root a écrit :
I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*.
I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step.
So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases !
I agree with your summary and conclusion.
Eric
Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases.
Ben Root
P.S. - I know they aren't verbs...
Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling?
np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan)
That sounds like a good idea to me. Someone wanting a fast way to fill an array will probably check out the 'empty' docstring first.
See you,
Matthew
+1 from me. Even though it *is* weird to have both "empty" and "fill" ;) -=- Olivier
On Jan 17, 2013 8:01 PM, "Olivier Delalleau" <shish@keba.be> wrote:
2013/1/17 Matthew Brett <matthew.brett@gmail.com>:
Hi,
On Thu, Jan 17, 2013 at 10:27 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing <efiring@hawaii.edu>
On 2013/01/17 4:13 AM, Pierre Haessig wrote:
Hi,
Le 14/01/2013 20:05, Benjamin Root a écrit : > I do like the way you are thinking in terms of the broadcasting > semantics, but I wonder if that is a bit awkward. What I mean
is, if
> one were to use broadcasting semantics for creating an array, wouldn't > one have just simply used broadcasting anyway? The point of > broadcasting is to _avoid_ the creation of unneeded arrays. But maybe > I can be convinced with some examples.
I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones",
wrote: there are
maybe not so many usecases for filled arrays *other than zeros values*.
I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step.
So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases !
I agree with your summary and conclusion.
Eric
Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases.
Ben Root
P.S. - I know they aren't verbs...
Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling?
np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan)
That sounds like a good idea to me. Someone wanting a fast way to fill an array will probably check out the 'empty' docstring first.
See you,
Matthew
+1 from me. Even though it *is* weird to have both "empty" and "fill" ;)
I'd almost prefer such a keyword be added to np.ones() to avoid that weirdness. (something like "an array of ones where one equals X") Ray
Hi, Le 17/01/2013 23:31, Matthew Brett a écrit :
Would it be too weird or clumsy to extend the empty and empty_like functions
to do the filling?
np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan) That sounds like a good idea to me. Someone wanting a fast way to fill an array will probably check out the 'empty' docstring first. Oh, that sounds very good to me. There is indeed a bit of contradictions between "empty" and "fill" but maybe not that strong if we think of "empty" as a "void of actual information". (Especially true when the fill value is nan or inf, which, as Ben just mentionned are probably the most commonly used fill value after zero.)
Maybe a keyword named "value" instead of "fill" may help soften the semantic opposition with "empty" ? best, Pierre
Hi, On Fri, Jan 18, 2013 at 1:48 PM, Pierre Haessig <pierre.haessig@crans.org> wrote:
Hi, Le 17/01/2013 23:31, Matthew Brett a écrit :
Would it be too weird or clumsy to extend the empty and empty_like functions
to do the filling?
np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan) That sounds like a good idea to me. Someone wanting a fast way to fill an array will probably check out the 'empty' docstring first. Oh, that sounds very good to me. There is indeed a bit of contradictions between "empty" and "fill" but maybe not that strong if we think of "empty" as a "void of actual information". (Especially true when the fill value is nan or inf, which, as Ben just mentionned are probably the most commonly used fill value after zero.)
Maybe a keyword named "value" instead of "fill" may help soften the semantic opposition with "empty" ?
I personally find 'fill' OK. I'd read: a = np.empty((10, 10), fill=np.nan) as "make an empty array shape (10, 10) and fill with nans" Which would indeed be what the code was doing :) So I doubt that the semantic clash would cause any long term problems, Best, Matthew
On Fri, Jan 18, 2013 at 2:22 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
I personally find 'fill' OK. I'd read:
a = np.empty((10, 10), fill=np.nan)
as
"make an empty array shape (10, 10) and fill with nans"
+1 simple, does the job, and doesn't bloat the API. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Jan 18, 2013 at 11:31 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
On Fri, Jan 18, 2013 at 2:22 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
I personally find 'fill' OK. I'd read:
a = np.empty((10, 10), fill=np.nan)
as
"make an empty array shape (10, 10) and fill with nans"
+1
simple, does the job, and doesn't bloat the API.
+1 from me too. Ralf
On Fri, Jan 18, 2013 at 2:22 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
I personally find 'fill' OK. I'd read:
a = np.empty((10, 10), fill=np.nan)
as
"make an empty array shape (10, 10) and fill with nans"
Which would indeed be what the code was doing :) So I doubt that the semantic clash would cause any long term problems,
+1, practicality beats purity...
On 17/01/2013 23:27, Mark Wiebe wrote:
Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling?
np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan)
Wouldn't it be more natural to extend the ndarray constructor? np.ndarray((10, 10), fill=np.nan) It looks more natural to me. In this way it is not possible to have the _like extension, but I don't see it as a major drawback. Cheers, Daniele
On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi <daniele@grinta.net>wrote:
On 17/01/2013 23:27, Mark Wiebe wrote:
Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling?
np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan)
Wouldn't it be more natural to extend the ndarray constructor?
np.ndarray((10, 10), fill=np.nan)
It looks more natural to me. In this way it is not possible to have the _like extension, but I don't see it as a major drawback.
Cheers, Daniele
This isn't a bad idea. Although, I would wager that most people, like myself, use np.array() and np.array_like() instead of np.ndarray(). We should also double-check and see how well that would fit in with the other contructors like masked arrays and matrix objects. Ben Root
On 18/01/2013 15:19, Benjamin Root wrote:
On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net>> wrote:
On 17/01/2013 23:27, Mark Wiebe wrote: > Would it be too weird or clumsy to extend the empty and empty_like > functions to do the filling? > > np.empty((10, 10), fill=np.nan) > np.empty_like(my_arr, fill=np.nan)
Wouldn't it be more natural to extend the ndarray constructor?
np.ndarray((10, 10), fill=np.nan)
It looks more natural to me. In this way it is not possible to have the _like extension, but I don't see it as a major drawback.
Cheers, Daniele
This isn't a bad idea. Although, I would wager that most people, like myself, use np.array() and np.array_like() instead of np.ndarray(). We should also double-check and see how well that would fit in with the other contructors like masked arrays and matrix objects.
Hello Ben, I don't really get what you mean with this. np.array() construct a numpy array from an array-like object, np.ndarray() accepts a dimensions tuple as first parameter, I don't see any np.array_like in the current numpy release. Cheers, Daniele
On Fri, Jan 18, 2013 at 11:36 AM, Daniele Nicolodi <daniele@grinta.net>wrote:
On 18/01/2013 15:19, Benjamin Root wrote:
On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net>> wrote:
On 17/01/2013 23:27, Mark Wiebe wrote: > Would it be too weird or clumsy to extend the empty and empty_like > functions to do the filling? > > np.empty((10, 10), fill=np.nan) > np.empty_like(my_arr, fill=np.nan)
Wouldn't it be more natural to extend the ndarray constructor?
np.ndarray((10, 10), fill=np.nan)
It looks more natural to me. In this way it is not possible to have
the
_like extension, but I don't see it as a major drawback.
Cheers, Daniele
This isn't a bad idea. Although, I would wager that most people, like myself, use np.array() and np.array_like() instead of np.ndarray(). We should also double-check and see how well that would fit in with the other contructors like masked arrays and matrix objects.
Hello Ben,
I don't really get what you mean with this. np.array() construct a numpy array from an array-like object, np.ndarray() accepts a dimensions tuple as first parameter, I don't see any np.array_like in the current numpy release.
Cheers, Daniele
My bad, I had a brain-fart and got mixed up. I was thinking of np.empty(). In fact, I never use np.ndarray(), I use np.empty(). Besides np.ndarray() being the actual constructor, what is the difference between them? Ben Root
On 18/01/2013 17:46, Benjamin Root wrote:
On Fri, Jan 18, 2013 at 11:36 AM, Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net>> wrote:
On 18/01/2013 15:19, Benjamin Root wrote: > > > On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi <daniele@grinta.net <mailto:daniele@grinta.net> > <mailto:daniele@grinta.net <mailto:daniele@grinta.net>>> wrote: > > On 17/01/2013 23:27, Mark Wiebe wrote: > > Would it be too weird or clumsy to extend the empty and empty_like > > functions to do the filling? > > > > np.empty((10, 10), fill=np.nan) > > np.empty_like(my_arr, fill=np.nan) > > Wouldn't it be more natural to extend the ndarray constructor? > > np.ndarray((10, 10), fill=np.nan) > > It looks more natural to me. In this way it is not possible to have the > _like extension, but I don't see it as a major drawback. > > > Cheers, > Daniele > > > This isn't a bad idea. Although, I would wager that most people, like > myself, use np.array() and np.array_like() instead of np.ndarray(). We > should also double-check and see how well that would fit in with the > other contructors like masked arrays and matrix objects.
Hello Ben,
I don't really get what you mean with this. np.array() construct a numpy array from an array-like object, np.ndarray() accepts a dimensions tuple as first parameter, I don't see any np.array_like in the current numpy release.
Cheers, Daniele
My bad, I had a brain-fart and got mixed up. I was thinking of np.empty(). In fact, I never use np.ndarray(), I use np.empty(). Besides np.ndarray() being the actual constructor, what is the difference between them?
I was also wondering what's the difference between np.ndarray() and np.empty(). I thought the second was a wrapper around the first, but it looks like both of them are actually implemented in C... Cheers, Daniele
Hi, Le 14/01/2013 20:17, Alan G Isaac a écrit :
a = np.tile(5,(1,2,3)) a array([[[5, 5, 5], [5, 5, 5]]]) np.tile(1,a.shape) array([[[1, 1, 1], [1, 1, 1]]])
I had not realized a scalar first argument was possible. I didn't know either ! I discovered this use in the thread of this discussion. Just like Ben, I've almost never used "np.tile" neither its cousin "np.repeat"...
Now, in the process of rediscovering those two functions, I was just wondering whether it would make sense to repackage them in order to allow the simple functionality of initializing a non-empty array. In term of choosing the name (or actually the verb), I prefer "repeat" because it's a more familiar concept than "tile". However, repeat may need more changes to make it work than tile. Indeed we currently have :
tile(nan, (3,3)) # works fine, but is pretty slow for that purpose, And doesn't accept a dtype arg array([[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]])
Doesn't work for that purpose:
repeat(nan, (3,3)) [...] ValueError: a.shape[axis] != len(repeats)
So what people think of this "green" approach of recycling existing API into a slightly different function (without breaking current behavior of course) Best, Pierre
On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern <robert.kern@gmail.com> wrote:
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
On 1/13/2013 6:39 PM, Nathaniel Smith wrote:
This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations.
Strongly agree. It is not worth a violation to save two keystrokes: "\na". (Three or four for a longer name, given name completion.) Alan Isaac
(The nice thing about np.filled() is that it makes np.zeros() and np.ones() feel like clutter, rather than the reverse... not that I'm suggesting ever getting rid of them, but it makes the API conceptually feel smaller, not larger.) Coming from the Matlab syntax, I feel that np.zeros and np.ones are in numpy for Matlab (and maybe others ?) compatibilty and are useful for
Hi, Le 14/01/2013 00:39, Nathaniel Smith a écrit : that. Now that I've been "enlightened" by Python, I think that those functions (especially np.ones) are indeed clutter. Therefore I favor the introduction of these two new functions. However, I think Eric's remark about masked array API compatibility is important. I don't know what other names are possible ? np.const ? Or maybe np.tile is also useful for that same purpose ? In that case adding a dtype argument to np.tile would be useful. best, Pierre
On Mon, Jan 14, 2013 at 7:38 AM, Pierre Haessig <pierre.haessig@crans.org>wrote:
Hi,
(The nice thing about np.filled() is that it makes np.zeros() and np.ones() feel like clutter, rather than the reverse... not that I'm suggesting ever getting rid of them, but it makes the API conceptually feel smaller, not larger.) Coming from the Matlab syntax, I feel that np.zeros and np.ones are in numpy for Matlab (and maybe others ?) compatibilty and are useful for
Le 14/01/2013 00:39, Nathaniel Smith a écrit : that. Now that I've been "enlightened" by Python, I think that those functions (especially np.ones) are indeed clutter. Therefore I favor the introduction of these two new functions.
However, I think Eric's remark about masked array API compatibility is important. I don't know what other names are possible ? np.const ?
Or maybe np.tile is also useful for that same purpose ? In that case adding a dtype argument to np.tile would be useful.
best, Pierre
I am also +1 on the idea of having a filled() and filled_like() function (I learned a long time ago to just do a = np.empty() and a.fill() rather than the multiplication trick I learned from Matlab). However, the collision with the masked array API is a non-starter for me. np.const() and np.const_like() probably make the most sense, but I would prefer a verb over a noun. Ben Root
Why not optimize NumPy to detect a mul of an ndarray by a scalar to call fill? That way, "np.empty * 2" will be as fast as "x=np.empty; x.fill(2)"? Fred On Mon, Jan 14, 2013 at 9:57 AM, Benjamin Root <ben.root@ou.edu> wrote:
On Mon, Jan 14, 2013 at 7:38 AM, Pierre Haessig <pierre.haessig@crans.org> wrote:
Hi,
(The nice thing about np.filled() is that it makes np.zeros() and np.ones() feel like clutter, rather than the reverse... not that I'm suggesting ever getting rid of them, but it makes the API conceptually feel smaller, not larger.) Coming from the Matlab syntax, I feel that np.zeros and np.ones are in numpy for Matlab (and maybe others ?) compatibilty and are useful for
Le 14/01/2013 00:39, Nathaniel Smith a écrit : that. Now that I've been "enlightened" by Python, I think that those functions (especially np.ones) are indeed clutter. Therefore I favor the introduction of these two new functions.
However, I think Eric's remark about masked array API compatibility is important. I don't know what other names are possible ? np.const ?
Or maybe np.tile is also useful for that same purpose ? In that case adding a dtype argument to np.tile would be useful.
best, Pierre
I am also +1 on the idea of having a filled() and filled_like() function (I learned a long time ago to just do a = np.empty() and a.fill() rather than the multiplication trick I learned from Matlab). However, the collision with the masked array API is a non-starter for me. np.const() and np.const_like() probably make the most sense, but I would prefer a verb over a noun.
Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Jan 14, 2013 at 4:12 PM, Frédéric Bastien <nouiz@nouiz.org> wrote:
Why not optimize NumPy to detect a mul of an ndarray by a scalar to call fill? That way, "np.empty * 2" will be as fast as "x=np.empty; x.fill(2)"?
In general, each element of an array will be different, so the result of the multiplication will be different, so fill can not be used. -- Robert Kern
On Mon, Jan 14, 2013 at 2:57 PM, Benjamin Root <ben.root@ou.edu> wrote:
I am also +1 on the idea of having a filled() and filled_like() function (I learned a long time ago to just do a = np.empty() and a.fill() rather than the multiplication trick I learned from Matlab). However, the collision with the masked array API is a non-starter for me. np.const() and np.const_like() probably make the most sense, but I would prefer a verb over a noun.
To get an array of 1's, you call np.ones(shape), to get an array of 0's you call np.zeros(shape) so to get an array of val's why not call np.vals(shape, val)? Cheers Robins
On Mon, Jan 14, 2013 at 9:57 AM, Benjamin Root <ben.root@ou.edu> wrote:
On Mon, Jan 14, 2013 at 7:38 AM, Pierre Haessig <pierre.haessig@crans.org> wrote:
Hi,
(The nice thing about np.filled() is that it makes np.zeros() and np.ones() feel like clutter, rather than the reverse... not that I'm suggesting ever getting rid of them, but it makes the API conceptually feel smaller, not larger.) Coming from the Matlab syntax, I feel that np.zeros and np.ones are in numpy for Matlab (and maybe others ?) compatibilty and are useful for
Le 14/01/2013 00:39, Nathaniel Smith a écrit : that. Now that I've been "enlightened" by Python, I think that those functions (especially np.ones) are indeed clutter. Therefore I favor the introduction of these two new functions.
However, I think Eric's remark about masked array API compatibility is important. I don't know what other names are possible ? np.const ?
Or maybe np.tile is also useful for that same purpose ? In that case adding a dtype argument to np.tile would be useful.
best, Pierre
I am also +1 on the idea of having a filled() and filled_like() function (I learned a long time ago to just do a = np.empty() and a.fill() rather than the multiplication trick I learned from Matlab). However, the collision with the masked array API is a non-starter for me. np.const() and np.const_like() probably make the most sense, but I would prefer a verb over a noun.
Definitely -1 on const. Falsely implies immutability, to my mind. David
On Sun, Jan 13, 2013 at 4:24 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1)
The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in?
One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self:
a = np.empty(...).fill(20.0)
My thought also. Shades of the Python `.sort` method... Chuck
participants (22)
-
Alan G Isaac -
Benjamin Root -
Charles R Harris -
Chris Barker - NOAA Federal -
Daniele Nicolodi -
Dave Hirschfeld -
David Warde-Farley -
eat -
Eric Firing -
Fernando Perez -
Frédéric Bastien -
josef.pktd@gmail.com -
Mark Wiebe -
Matthew Brett -
Nathaniel Smith -
Olivier Delalleau -
Pierre Haessig -
Ralf Gommers -
Robert Kern -
Robin -
Skipper Seabold -
Thouis (Ray) Jones