Re: [Numpy-discussion] Make np.bincount output same dtype as weights
Would it make sense to just make the output type large enough to hold the cumulative sum of the weights? - Joseph Fox-Rabinovitz ------ Original message------From: Jaime Fernández del RíoDate: Sat, Mar 26, 2016 16:16To: Discussion of Numerical Python;Subject:[Numpy-discussion] Make np.bincount output same dtype as weightsHi all, I have just submitted a PR (#7464) that fixes an enhancement request (#6854), making np.bincount return an array of the same type as the weights parameter. This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this. Main discussion points:np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement.The return is of the same type as weights, which means that small integers are very likely to overflow. This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum?Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types. I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this?Does a behavior change like this require some deprecation period? What would that look like?I have also added broadcasting of weights to the full size of list, so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile the single weight to the size of the bins list. Any other thoughts are very welcome as well! Jaime -- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
Would it make sense to just make the output type large enough to hold the cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request (#6854), making np.bincount return an array of the same type as the weights parameter. This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this. Main discussion points:
np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement. The return is of the same type as weights, which means that small integers are very likely to overflow. This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum?
I always thought of bincount with weights just as a group-by sum. So it would be easier to remember and have fewer surprises if it matches the behavior of np.sum.
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already? Based on a quick example with numpy 1.9.2, I don't think I ever used bool weights before.
This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types. I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this? Does a behavior change like this require some deprecation period? What would that look like? I have also added broadcasting of weights to the full size of list, so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?) Josef
Jaime
-- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect
np.bincount to behave like np.sum with regards to promoting weights dtypes.
Including bool.
On Sun, Mar 27, 2016 at 1:58 PM,
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
wrote: Would it make sense to just make the output type large enough to hold the cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request (#6854), making np.bincount return an array of the same type as the weights parameter. This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this. Main discussion points:
np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement. The return is of the same type as weights, which means that small integers are very likely to overflow. This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum?
I always thought of bincount with weights just as a group-by sum. So it would be easier to remember and have fewer surprises if it matches the behavior of np.sum.
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already? Based on a quick example with numpy 1.9.2, I don't think I ever used bool weights before.
This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types. I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this? Does a behavior change like this require some deprecation period? What would that look like? I have also added broadcasting of weights to the full size of list, so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?)
Josef
Jaime
-- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de
dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Another +1 for Josef's interpretation from me. Consistency with np.sum
seems like the best option.
On Sat, Mar 26, 2016 at 11:12 PM, Juan Nunez-Iglesias
Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect np.bincount to behave like np.sum with regards to promoting weights dtypes. Including bool.
On Sun, Mar 27, 2016 at 1:58 PM,
wrote: Would it make sense to just make the output type large enough to hold
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
wrote: the cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request (#6854), making np.bincount return an array of the same type as the weights parameter. This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this. Main discussion points:
np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement. The return is of the same type as weights, which means that small integers are very likely to overflow. This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum?
I always thought of bincount with weights just as a group-by sum. So it would be easier to remember and have fewer surprises if it matches the behavior of np.sum.
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already? Based on a quick example with numpy 1.9.2, I don't think I ever used bool weights before.
This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types. I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this? Does a behavior change like this require some deprecation period? What would that look like? I have also added broadcasting of weights to the full size of list, so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?)
Josef
Jaime
-- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de
dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Have modified the PR to do the "promote integers to at least long" we do in
np.sum.
Jaime
On Mon, Mar 28, 2016 at 9:55 PM, CJ Carey
Another +1 for Josef's interpretation from me. Consistency with np.sum seems like the best option.
On Sat, Mar 26, 2016 at 11:12 PM, Juan Nunez-Iglesias
wrote: Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect np.bincount to behave like np.sum with regards to promoting weights dtypes. Including bool.
On Sun, Mar 27, 2016 at 1:58 PM,
wrote: Would it make sense to just make the output type large enough to hold
On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
wrote: the cumulative sum of the weights?
- Joseph Fox-Rabinovitz
------ Original message------
From: Jaime Fernández del Río
Date: Sat, Mar 26, 2016 16:16
To: Discussion of Numerical Python;
Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
Hi all,
I have just submitted a PR (#7464) that fixes an enhancement request (#6854), making np.bincount return an array of the same type as the weights parameter. This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this. Main discussion points:
np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement. The return is of the same type as weights, which means that small integers are very likely to overflow. This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum?
I always thought of bincount with weights just as a group-by sum. So it would be easier to remember and have fewer surprises if it matches the behavior of np.sum.
Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?
Isn't this calculating the sum, i.e. count of True by group, already? Based on a quick example with numpy 1.9.2, I don't think I ever used bool weights before.
This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types. I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this? Does a behavior change like this require some deprecation period? What would that look like? I have also added broadcasting of weights to the full size of list, so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile the single weight to the size of the bins list.
Any other thoughts are very welcome as well!
(2-D weights ?)
Josef
Jaime
-- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de
dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
participants (5)
-
CJ Carey
-
Jaime Fernández del Río
-
josef.pktd@gmail.com
-
Joseph Fox-Rabinovitz
-
Juan Nunez-Iglesias