[Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram

Eric Wieser wieser.eric+numpy at gmail.com
Fri Mar 16 01:09:52 EDT 2018


That sounds like a reasonable extension - but I think there still exist
cases where you want to treat the data as one uniform set when computing
bins (toggling between orthogonal subsets of data) so isn't really a useful
replacement.

I suppose this becomes relevant when `density` is passed to the individual
histogram invocations. Does matplotlib handle that correctly for stacked
histograms?

On Thu, Mar 15, 2018, 20:14 Nathaniel Smith <njs at pobox.com> wrote:

> Instead of an nobs argument, maybe we should have a version that accepts
> multiple data sets, so that we have the full information and can improve
> the algorithm over time.
>
> On Mar 15, 2018 7:57 PM, "Thomas Caswell" <tcaswell at gmail.com> wrote:
>
>> Yes I like the name.
>>
>> The primary use-case for Matplotlib is that our `hist` method can take in
>> a list of arrays and produces N histograms in one shot. Currently with
>> 'auto' we only use the first data set to sort out what the bins should be
>> and then re-use those for the rest of the data sets.  This will let us get
>> the bins on the merged input, but I take Josef's point that this is not
>> actually what we want....
>>
>> Tom
>>
>> On Mon, Mar 12, 2018 at 11:35 PM <josef.pktd at gmail.com> wrote:
>>
>>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser
>>> <wieser.eric+numpy at gmail.com> wrote:
>>> >> Given that the bin selection are data driven, transferring them
>>> across datasets might not be so useful.
>>> >
>>> > The main application would be to compute bins across the union of all
>>> > datasets. This is already possibly by using `np.histogram` and
>>> > discarding the first result, but that's super wasteful.
>>>
>>> assuming "union" means a combined dataset.
>>>
>>> If you stack  datasets, then the number of observations will not be
>>> correct for individual datasets.
>>>
>>> In that case an additional keyword like nobs, or whatever name would
>>> be appropriate for numpy, would be useful, e.g. use the average number
>>> of observations across datasets.
>>> Auxiliary statistic like std could then be computed on the total
>>> dataset (if that makes sense, which would not be the case if the
>>> variance across datasets is larger than the variance within datasets.
>>>
>>> Josef
>>>
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180316/0326ccb4/attachment.html>


More information about the NumPy-Discussion mailing list