[Numpy-discussion] Proposal - extend histograms api to allow uneven bins

Tue Feb 11 16:12:13 EST 2020

Just a few thoughts re: the changes proposed in
https://github.com/numpy/numpy/pull/14278

1. Though the PR is limited to the 'auto' kwarg, the issue of potential
memory problems for the automated binning methods is a more general one
(e.g. #15332 <https://github.com/numpy/numpy/issues/15332>).

2. The main concern that jumps out to me is downstream users who are
relying on the implicit assumption of regular binning. This is of course
bad practice and makes even less sense when using one of the bin
estimators, so I'm not sure how big of a concern it is. However, there is
likely downstream user code that relies on the regular binning assumption,
especially since, as far as I know, NumPy has never implemented binning
techniques that return irregular bins.

3. The astropy project have at least one estimator that returns irregular
bins <https://docs.astropy.org/en/stable/visualization/histogram.html#>.  I
checked for issues
<https://github.com/astropy/astropy/issues?utf8=%E2%9C%93&q=is%3Aissue+histogram>
related to irregular binning: though they have many of the same problems
with the automatic bin estimators (i.e. memory problems for inputs with
outliers), I didn't see anything specifically related to irregular binning

I just wanted to add my two cents. The binning-data-with-outliers problem
is very common in high-resolution spectroscopy, and I have seen
practitioners rely on the assumption of regular binning (e.g. divide the
`range` by the number of bins) to specify bin centers even though this is
not the right way to do things.

Thanks for taking the time to write up your work!

On Mon, Feb 10, 2020 at 10:53 PM <numpy-discussion-request at python.org>
wrote:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at python.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>    1. Proposal - extend histograms api to allow uneven bins
>       (Alexander Reeves)
>    2. Re: NEP 38 - Universal SIMD intrinsics (Ralf Gommers)
>    3. Re: NEP 38 - Universal SIMD intrinsics (Matti Picus)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 10 Feb 2020 18:07:40 -0800
> From: Alexander Reeves <lxndr.rvs at gmail.com>
> To: numpy-discussion at python.org
> Subject: [Numpy-discussion] Proposal - extend histograms api to allow
>         uneven bins
> Message-ID:
>         <CABeAeRyiVt4RJP2ew7=
> C4c4itJO3WZptdD6yrhWV6NgUoqT_mQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Greetings,
>
> I have a PR that warrants discussion according to @seberg. See
> https://github.com/numpy/numpy/pull/14278.
>
> It is an enhancement that fixes a bug. The original bug is that when using
> the fd estimator on a dataset with small inter-quartile range and large
> outliers, the current codebase produces more bins than memory allows. There
> are several related bug reports (see #11879, #10297, #8203).
>
> In terms of scope, I restricted my changes to conditions where
> np.histogram(bins='auto') defaults to the 'fd'.  For the actual fix, I
> actually enhanced the API. I used a suggestion from @eric-wieser to merge
> empty histogram bins. In practice this solves the outsized bins issue.
>
> However @seberg is concerned that extending the API in this way may not be
> the way to go. For example, if you use "auto" once, and then re-use the
> bins, the uneven bins may not be what you want.
>
> Furthermore @eric-wieser is concerned that there may be a floating-point
> devil in the details. He advocates using the hypothesis testing package to
> increase our confidence that the current implementation adequately handles
> corner cases.
>
> I would like to do my part in improving the code base. I don't have strong
> opinions but I have to admit that I would like to eventually make a PR that
> resolves these bugs. This has been a PR half a year in the making after
> all.
>
> Thoughts?
>
> -areeves87
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/69a4fab1/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 10 Feb 2020 23:16:44 -0600
> From: Ralf Gommers <ralf.gommers at gmail.com>
> To: Discussion of Numerical Python <numpy-discussion at python.org>
> Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
> Message-ID:
>         <CABL7CQjzqoZ5Zmo=+
> ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
> > ?snip?
> >
> > > 1) Once NumPy adds the framework and initial set of Universal
> Intrinsic,
> > if contributors want to leverage a new architecture specific SIMD
> > instruction, will they be expected to add software implementation of this
> > instruction for all other architectures too?
> >
> > In my opinion, if the instructions are lower, then yes. For example, one
> > cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128
> > and SSE*.  However, I would not expect one person or team to be an expert
> > in all assemblies, so intrinsics for one architecture can be developed
> > independently of another.
> >
>
> I think this doesn't quite answer the question. If I understand correctly,
> it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing
> from the supported AVX512 instructions in master). I think the answer is
> yes, it needs to be added for other architectures as well. Otherwise, if
> universal intrinsics are added ad-hoc and there's no guarantee that a
> universal instruction is available for all main supported platforms, then
> over time there won't be much that's "universal" about the framework.
>
> This is a different question though from adding a new ufunc implementation.
> I would expect accelerating ufuncs via intrinsics that are already
> supported to be much more common than having to add new intrinsics. Does
> that sound right?
>
>
> > > 2) On whom does the burden lie to ensure that new implementations are
> > benchmarked and shows benefits on every architecture? What happens if
> > optimizing an Ufunc leads to improving performance on one architecture
> and
> > worsens performance on another?
> >
>
> This is slightly hard to provide a recipe for. I suspect it may take a
> while before this becomes an issue, since we don't have much SIMD code to
> begin with. So adding new code with benchmarks will likely show
> improvements on all architectures (we should ensure benchmarks can be run
> via CI, otherwise it's too onerous). And if not and it's not easily
> fixable, the problematic platform could be skipped so performance there is
> unchanged.
>
> Only once there's existing universal intrinsics and then they're tweaked
> will we have to be much more careful I'd think.
>
> Cheers,
> Ralf
>
>
>
> >
> > I would look at this from a maintainability point of view. If we are
> > increasing the code size by 20% for a certain ufunc, there must be a
> > domonstrable 20% increase in performance on any CPU. That is to say,
> > micro-optimisation will be unwelcome, and code readability will be
> > preferable. Usually we ask the submitter of the PR to test the PR with a
> > machine they have on hand, and I would be inclined to keep this trend of
> > self-reporting. Of course, if someone else came along and reported a
> > performance regression of, say, 10%, then we have increased code by 20%,
> > with only a net 5% gain in performance, and the PR will have to be
> reverted.
> >
> > ?snip?
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/4b8354cf/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Tue, 11 Feb 2020 08:53:18 +0200
> From: Matti Picus <matti.picus at gmail.com>
> To: numpy-discussion at python.org
> Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
> Message-ID: <11f71397-0bac-aaba-3ee9-133ad098b894 at gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
>
> On 11/2/20 7:16 am, Ralf Gommers wrote:
> >
> >
> > On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi
> > <einstein.edison at gmail.com <mailto:einstein.edison at gmail.com>> wrote:
> >
> >     ?snip?
> >
> >     > 1) Once NumPy adds the framework and initial set of Universal
> Intrinsic, if
> >     contributors want to leverage a new architecture specific SIMD
> >     instruction, will they be expected to add software implementation
> >     of this instruction for all other architectures too?
> >
> >     In my opinion, if the instructions are lower, then yes. For
> >     example, one cannot add AVX-512 without adding, for example adding
> >     AVX-256 and AVX-128 and SSE*.? However, I would not expect one
> >     person or team to be an expert in all assemblies, so intrinsics
> >     for one architecture can be developed independently of another.
> >
> >
> > I think this doesn't quite answer the question. If I understand
> > correctly, it's about a single instruction (e.g. one needs
> > |"VEXP2PD"and it's missing from the supported AVX512 instructions in
> > master). I think the answer is yes, it needs to be added for other
> > architectures as well. Otherwise, if universal intrinsics are added
> > ad-hoc and there's no guarantee that a universal instruction is
> > available for all main supported platforms, then over time there won't
> > be much that's "universal" about the framework.|
> > |
> > |
> > |This is a different question though from adding a new ufunc
> > implementation. I would expect accelerating ufuncs via intrinsics that
> > are already supported to be much more common than having to add new
> > intrinsics. Does that sound right?
> > |
> |Yes. Universal intrinsics are cross-platform. However the NEP is open
> to the possibility that certain architectures may have SIMD intrinsics
> that cannot be expressed in terms of intrinsics for other platforms, and
> so there may be a use case for architecture-specific loops. This is
> explicitly stated in the latest PR to the NEP: "|If the regression is
> not minimal, we may choose to keep the X86-specific code for that
> platform and use the universal intrisic code for other platforms."
> > |
> > |
> >
> >
> >     > 2) On whom does the burden lie to ensure that new
> >     implementations are benchmarked and shows benefits on every
> >     architecture? What happens if optimizing an Ufunc leads to
> >     improving performance on one architecture and worsens performance
> >     on another?
> >
> >
> > This is slightly hard to provide a recipe for. I suspect it may take a
> > while before this becomes an issue, since we don't have much SIMD code
> > to begin with. So adding new code with benchmarks will likely show
> > improvements on all architectures (we should ensure benchmarks can be
> > run via CI, otherwise it's too onerous). And if not and it's not
> > easily fixable, the problematic platform could be skipped so
> > performance there is unchanged.
>
>
> On HEAD, out of the 89 ufuncs in
> numpy.core.code_generators.generate_umath.defdict, 34 have X86-specific
> simd loops:
>
>
>  >>> [x for x in defdict.keys() if any([td.simd for td in
> defdict[x].type_descriptions])]
> ['add', 'subtract', 'multiply', 'conjugate', 'square', 'reciprocal',
> 'absolute', 'negative', 'greater', 'greater_equal', 'less',
> 'less_equal', 'equal', 'not_equal', 'logical_and', 'logical_not',
> 'logical_or', 'maximum', 'minimum', 'bitwise_and', 'bitwise_or',
> 'bitwise_xor', 'invert', 'left_shift', 'right_shift', 'cos', 'sin',
> 'exp', 'log', 'sqrt', 'ceil', 'trunc', 'floor', 'rint']
>
>
> They would be the first targets for universal intrinsics. Of them I
> estimate that the ones with more than one loop for at least one dtype
> signature would be the most difficult, since these have different
> optimizations for avx2, fma, and/or avx512f:
>
>
> ['square', 'reciprocal', 'absolute', 'cos', 'sin', 'exp', 'log', 'sqrt',
> 'ceil', 'trunc', 'floor', 'rint']
>
>
> The other 55 ufuncs, for completeness, are
>
>
> ['floor_divide', 'true_divide', 'fmod', '_ones_like', 'power',
> 'float_power', '_arg', 'positive', 'sign', 'logical_xor', 'clip',
> 'fmax', 'fmin', 'logaddexp', 'logaddexp2', 'heaviside', 'degrees',
> 'rad2deg', 'radians', 'deg2rad', 'arccos', 'arccosh', 'arcsin',
> 'arcsinh', 'arctan', 'arctanh', 'tan', 'cosh', 'sinh', 'tanh', 'exp2',
> 'expm1', 'log2', 'log10', 'log1p', 'cbrt', 'fabs', 'arctan2',
> 'remainder', 'divmod', 'hypot', 'isnan', 'isnat', 'isinf', 'isfinite',
> 'signbit', 'copysign', 'nextafter', 'spacing', 'modf', 'ldexp', 'frexp',
> 'gcd', 'lcm', 'matmul']
>
>
> As for testing accuracy: we recently added a framework for testing ulp
> variation of ufuncs against "golden results" in
> numpy/core/tests/test_umath_accuracy. So far float32 is tested for exp,
> log, cos, sin. Others may be tested elsewhere by specific tests, for
> instance numpy/core/test/test_half.py has test_half_ufuncs.
>
>
> It is difficult to do benchmarking on CI: the machines that run CI vary
> too much. We would need to set aside a machine for this and carefully
> set it up to keep CPU speed and temperature constant. We do have
> benchmarks for ufuncs (they could always be improved). I think Pauli
> runs the benchmarks carefully on X86, and may even makes the results
> public, but that resource is not really on PR reviewers' radar. We could
> run benchmarks on the gcc build farm machines for other architectures.
> Those machines are shared but not heavily utilized.
>
>
> > Only once there's existing universal intrinsics and then they're
> > tweaked will we have to be much more careful I'd think.
> >
> >
> >
> >     I would look at this from a maintainability point of view. If we
> >     are increasing the code size by 20% for a certain ufunc, there
> >     must be a domonstrable 20% increase in performance on any CPU.
> >     That is to say, micro-optimisation will be unwelcome, and code
> >     readability will be preferable. Usually we ask the submitter of
> >     the PR to test the PR with a machine they have on hand, and I
> >     would be inclined to keep this trend of self-reporting. Of course,
> >     if someone else came along and reported a performance regression
> >     of, say, 10%, then we have increased code by 20%, with only a net
> >     5% gain in performance, and the PR will have to be reverted.
> >
> >     ?snip?
> >
>
> I think we should be careful not to increase the reviewer burden, and
> try to automate as much as possible. It would be nice if we could at
> some point set up a set of bots that can be triggered to run benchmarks
> for us and report in the PR the results.
>
>
> Matti
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------
>
> End of NumPy-Discussion Digest, Vol 161, Issue 10
> *************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200211/cf2ee272/attachment-0001.html>