[Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Mar 16 09:43:41 EDT 2018


passing a list of arrays would be useful (aside of discriminating
between list and array_like)

In that case I would add a keyword like "within=True" to compute the additional
statistics like std or iqr on the group demeaned data.
This would remove the effect of (mean-)shifted datasets on those
auxiliary statistics.

aside: An alternative to using a list of arrays would be to include a
"groups" indicator
as keyword, and if it is not None, then compute based on averages
across groups or
pooled within statistics.


Josef



On Fri, Mar 16, 2018 at 3:06 AM, Nathaniel Smith <njs at pobox.com> wrote:
> Oh sure, I'm not suggesting it be impossible to calculate for a single data
> set. If nothing else, if we had a version that accepted a list of data sets,
> then you could always pass in a single-element list :-).
>
> On Mar 15, 2018 22:10, "Eric Wieser" <wieser.eric+numpy at gmail.com> wrote:
>>
>> That sounds like a reasonable extension - but I think there still exist
>> cases where you want to treat the data as one uniform set when computing
>> bins (toggling between orthogonal subsets of data) so isn't really a useful
>> replacement.
>>
>> I suppose this becomes relevant when `density` is passed to the individual
>> histogram invocations. Does matplotlib handle that correctly for stacked
>> histograms?
>>
>> On Thu, Mar 15, 2018, 20:14 Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>> Instead of an nobs argument, maybe we should have a version that accepts
>>> multiple data sets, so that we have the full information and can improve the
>>> algorithm over time.
>>>
>>> On Mar 15, 2018 7:57 PM, "Thomas Caswell" <tcaswell at gmail.com> wrote:
>>>>
>>>> Yes I like the name.
>>>>
>>>> The primary use-case for Matplotlib is that our `hist` method can take
>>>> in a list of arrays and produces N histograms in one shot. Currently with
>>>> 'auto' we only use the first data set to sort out what the bins should be
>>>> and then re-use those for the rest of the data sets.  This will let us get
>>>> the bins on the merged input, but I take Josef's point that this is not
>>>> actually what we want....
>>>>
>>>> Tom
>>>>
>>>> On Mon, Mar 12, 2018 at 11:35 PM <josef.pktd at gmail.com> wrote:
>>>>>
>>>>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser
>>>>> <wieser.eric+numpy at gmail.com> wrote:
>>>>> >> Given that the bin selection are data driven, transferring them
>>>>> >> across datasets might not be so useful.
>>>>> >
>>>>> > The main application would be to compute bins across the union of all
>>>>> > datasets. This is already possibly by using `np.histogram` and
>>>>> > discarding the first result, but that's super wasteful.
>>>>>
>>>>> assuming "union" means a combined dataset.
>>>>>
>>>>> If you stack  datasets, then the number of observations will not be
>>>>> correct for individual datasets.
>>>>>
>>>>> In that case an additional keyword like nobs, or whatever name would
>>>>> be appropriate for numpy, would be useful, e.g. use the average number
>>>>> of observations across datasets.
>>>>> Auxiliary statistic like std could then be computed on the total
>>>>> dataset (if that makes sense, which would not be the case if the
>>>>> variance across datasets is larger than the variance within datasets.
>>>>>
>>>>> Josef
>>>>>
>>>>> > _______________________________________________
>>>>> > NumPy-Discussion mailing list
>>>>> > NumPy-Discussion at python.org
>>>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list