Greetings,

I have a PR that warrants discussion according to @seberg. See https://github.com/numpy/numpy/pull/14278.

It is an enhancement that fixes a bug. The original bug is that when using the fd estimator on a dataset with small inter-quartile range and large outliers, the current codebase produces more bins than memory allows. There are several related bug reports (see #11879, #10297, #8203).

In terms of scope, I restricted my changes to conditions where np.histogram(bins='auto') defaults to the 'fd'. For the actual fix, I actually enhanced the API. I used a suggestion from @eric-wieser to merge empty histogram bins. In practice this solves the outsized bins issue.

However @seberg is concerned that extending the API in this way may not be the way to go. For example, if you use "auto" once, and then re-use the bins, the uneven bins may not be what you want.

Furthermore @eric-wieser is concerned that there may be a floating-point devil in the details. He advocates using the hypothesis testing package to increase our confidence that the current implementation adequately handles corner cases.

I would like to do my part in improving the code base. I don't have strong opinions but I have to admit that I would like to eventually make a PR that resolves these bugs. This has been a PR half a year in the making after all.

Thoughts?

-areeves87