[Numpy-discussion] is __array_ufunc__ ready for prime-time?

Thu Nov 2 13:01:03 EDT 2017

On Thu, Nov 2, 2017 at 12:46 PM, Ryan May <rmay31 at gmail.com> wrote:

> On Thu, Nov 2, 2017 at 6:56 AM, <josef.pktd at gmail.com> wrote:
>
>> On Thu, Nov 2, 2017 at 8:46 AM, <josef.pktd at gmail.com> wrote:
>>
>>> On Wed, Nov 1, 2017 at 6:55 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>>> wrote:
>>>
>>>> I think the biggest issues could be resolved if __array_concatenate__
>>>> were finished. Unfortunately I don't feel like I can take that on right now.
>>>>
>>>> See Ryan May's talk at scipy about using an ndarray subclass for units
>>>> and the issues he's run into:
>>>>
>>>> https://www.youtube.com/watch?v=qCo9bkT9sow
>>>>
>>>
>>>
>>> Interesting talk, but I don't see how general library code should know
>>> what units the output has.
>>> for example if units are some flows per unit of time and we average, sum
>>> or integrate over time, then what are the new units? (e.g. pandas time
>>> aggregation)
>>> What are units of covariance or correlation between two variables with
>>> the same units, and what are they between variables with different units?
>>>
>>> How do you concatenate and operate arrays with different units?
>>>
>>> interpolation or prediction would work with using the existing units.
>>>
>>> partially related:
>>> statsmodels uses a wrapper for pandas Series and DataFrames and tries to
>>> preserve the index when possible and make up a new DataFrame or Series if
>>> the existing index doesn't apply.
>>> E.g. predicted values and residuals are in terms of the original
>>> provided index, and could also get original units assigned. That would also
>>> be possible with prediction confidence intervals. But for the rest, see
>>> above.
>>>
>>
>> using pint
>>
>> >>> x
>> <Quantity([0 1 2 3 4], 'meter')>
>> >>> x / x
>> <Quantity([ nan   1.   1.   1.   1.], 'dimensionless')>
>>
>> >>> x / (1 + x)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py",
>> line 669, in __add__
>>     raise DimensionalityError(self._units, 'dimensionless')
>>     return self._add_sub(other, operator.add)
>>   File "C:\...\python-3.4.4.amd64\lib\site-packages\pint\quantity.py",
>> line 580, in _add_sub
>> pint.errors.DimensionalityError: Cannot convert from 'meter' to
>> 'dimensionless'
>>
>
> I'm not sure why you have a problem with that results. You tried to take a
> number in meters and add a dimensionless value to that--that's not a
> defined operation. That's like saying: "I have a distance of 12 meters and
> added 1 to it." 1 what? 1 meter? Great. 1 centimeter? I need to convert,
> but I can do that operation. 1 second? That makes no sense.
>
> If you add units to the 1 then it's a defined operation:
>
> >>> reg = pint.UnitRegistry()
> >>> x / (1 * ureg.meters + x)
> <Quantity([ 0.          0.5         0.66666667  0.75        0.8       ],
> 'dimensionless')>
>
>
>> np.exp(x)
>> raises
>> pint.errors.DimensionalityError: Cannot convert from 'meter' ([length])
>> to 'dimensionless' (dimensionless)
>>
>
> Well, the Taylor series for exp (around a=0) is:
>
> exp(x) = 1 + x + x**2 / 2 + x**3 / 6 + ...
>
> so for that to properly add up, x needs to be dimensionless. It should be
> noted, though, that I've *never* seen a formula, theoretically derived or
> empirically fit, require directly taking exp(x) where x is a physical
> quantity with units. Instead, you have:
>
> f = a * exp(kx)
>
> Properly calculated values for a, k will have appropriate units attached
> to them that allows the calculation to proceed without error
>

I was thinking of a simple logit model to predict whether it rains tomorrow
The Logit transformation for the probability is exp(k x) / (1 + exp(k x)
where k is a parameter to search for in the optimization.

x is a matrix with all predictors or explanatory variables which could all
have different units.

So it sounds to me if we drop asarray, then we just get exceptions or
possibly strange results, or we have to introduce a unit that matches
everything (like a joker card) for any constants that we are using.

Josef

>
> Ryan
>
> --
> Ryan May
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20171102/6f481e1d/attachment-0001.html>