Re: [Numpy-discussion] Proposal - extend histograms api to allow uneven bins

Just a few thoughts re: the changes proposed in https://github.com/numpy/numpy/pull/14278 1. Though the PR is limited to the 'auto' kwarg, the issue of potential memory problems for the automated binning methods is a more general one (e.g. #15332 <https://github.com/numpy/numpy/issues/15332>). 2. The main concern that jumps out to me is downstream users who are relying on the implicit assumption of regular binning. This is of course bad practice and makes even less sense when using one of the bin estimators, so I'm not sure how big of a concern it is. However, there is likely downstream user code that relies on the regular binning assumption, especially since, as far as I know, NumPy has never implemented binning techniques that return irregular bins. 3. The astropy project have at least one estimator that returns irregular bins <https://docs.astropy.org/en/stable/visualization/histogram.html#>. I checked for issues <https://github.com/astropy/astropy/issues?utf8=%E2%9C%93&q=is%3Aissue+histogram> related to irregular binning: though they have many of the same problems with the automatic bin estimators (i.e. memory problems for inputs with outliers), I didn't see anything specifically related to irregular binning I just wanted to add my two cents. The binning-data-with-outliers problem is very common in high-resolution spectroscopy, and I have seen practitioners rely on the assumption of regular binning (e.g. divide the `range` by the number of bins) to specify bin centers even though this is not the right way to do things. Thanks for taking the time to write up your work! On Mon, Feb 10, 2020 at 10:53 PM <numpy-discussion-request@python.org> wrote:
participants (1)
-
Ross Barnowski