[Numpy-discussion] (Value Based Promotion) Current Behaviour

Thu Jun 13 01:30:24 EDT 2019

Hi Sebastian,

One way to avoid an ugly lookup table and special cases is to store the amount of sign bits, the amount of integer/mantissa bits and the amount of exponent bits for each numeric style. A safe cast can only happen if all three are exceeded or equal. Just a thought.

Best Regards,
Hameer Abbasi

> On Wednesday, Jun 12, 2019 at 9:50 PM, Sebastian Berg <sebastian at sipsolutions.net (mailto:sebastian at sipsolutions.net)> wrote:
> On Wed, 2019-06-12 at 12:03 -0500, Sebastian Berg wrote:
> > On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote:
> > > HI Sebastian,
> > >
> > > Thanks for the overview! In the value-based casting, what perhaps
> > > surprises me most is that it is done within a kind; it would seem
> > > an
> > > improvement to check whether a given integer scalar is exactly
> > > representable in a given float (your example of 1024 in `float16`).
> > > If we switch to the python-only scalar values idea, I would suggest
> > > to abandon this. That might make dealing with things like `Decimal`
> > > or `Fraction` easier as well.
> > >
> >
> > Yeah, one can argue that since we have this "safe casting" based
> > approach, we should go all the way for the value based logic. I think
> > I
> > tend to agree, but I am not quite sure right now to be honest.
>
> Just realized, one issue with this is that you get much more "special
> cases" if you think of it in terms of "minimal dtype". Because
> suddenly, not just the unsigned/signed integers such as "< 128" are
> special, but even more values require special handling. An int16
> "minimal dtype" may or may not be castable to float16.
>
> For `can_cast` that does not matter much, but if we use the same logic
> for promotion things may get uglier. Although, maybe it just gets
> uglier implementation wise and is fairly logic on the user side...
>
> - Sebastian
>
>
> >
> > Fractions and Decimals are very interesting in that they raise the
> > question what happens to user dtypes [0]. Although, you would still
> > need a "no lower category" rule, since you do not want 1024. or 12/3
> > be
> > demoted to an integer.
> >
> > For me right now, what is most interesting is what we should do with
> > ufunc calls, and if we can simplify them. I feel right now we have to
> > types of ufuncs:
> >
> > 1. Ufuncs which use a "common type", where we can find the minimal
> > type
> > before dispatching.
> >
> > 2. More complex ufuncs, for which finding the minimal type is
> > trickier
> > [1]. And while I could not find any weird enough ufunc, I am not sure
> > that blind promotion is a good idea for general ufuncs.
> >
> > Best,
> >
> > Sebastian
> >
> >
> > [0] A python fraction could be converted to int64/int64 or
> > int32/int32,
> > etc. depending on the value, in principle. If we want such things to
> > work in principle, we need machinery (although I expect one could tag
> > that on later).
> > [1] It is not impossible, but we need to insert non-existing types
> > into
> > the type hierarchy.
> >
> >
> >
> > PS: Another interesting issue is that if we try to move away from
> > value
> > based casting for numpy scalars, that initial `np.asarray(...)` call
> > may lose the information that a python integer was passed in. So to
> > support such things, we might need a whole new machinery.
> >
> >
> >
> >
> > > All the best,
> > >
> > > Marten
> > >
> > > On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg <
> > > sebastian at sipsolutions.net> wrote:
> > > > Hi all,
> > > >
> > > > strange, something went wrong sending that email, but in any
> > > > case...
> > > >
> > > > I tried to "summarize" the current behaviour of promotion and
> > > > value
> > > > based promotion in numpy (correcting a small error in what I
> > > > wrote
> > > > earlier). Since it got a bit long, you can find it here (also
> > > > copy
> > > > pasted at the end):
> > > >
> > > > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA
> > > >
> > > > Allan's document which I link in there is also very interesting.
> > > > One
> > > > thing I had not really thought about before was the problem of
> > > > commutativity.
> > > >
> > > > I do not have any specific points I want to discuss based on it
> > > > (but
> > > > those are likely to come up again later).
> > > >
> > > > All the Best,
> > > >
> > > > Sebastian
> > > >
> > > >
> > > > -----------------------------
> > > >
> > > > PS: Below a copy of what I wrote:
> > > >
> > > > ---
> > > > title: Numpy Value Based Promotion Rules
> > > > author: Sebastian Berg
> > > > ---
> > > >
> > > >
> > > >
> > > > NumPy Value Based Scalar Casting and Promotion
> > > > ==============================================
> > > >
> > > > This document reviews some of the behaviours of the promotion
> > > > rules
> > > > within numpy. This is especially with respect to the promotion of
> > > > scalars and 0D arrays which inspect the value to decide casting
> > > > and
> > > > promotion.
> > > >
> > > > Other documents discussing these things:
> > > >
> > > > * `from numpy.testing import print_coercion_tables` prints the
> > > > current promotion tables including value based promotion for
> > > > small
> > > > positive/negative scalars.
> > > > * Allan Haldane's thoughts on changing casting/promotion to be
> > > > more
> > > > C-like and discussing things such as here:
> > > >
> > > > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> > > > * Discussion around the problem of uint64 and int64 being
> > > > promoted to
> > > > float64: https://github.com/numpy/numpy/issues/12525 (lists many
> > > > related issues).
> > > >
> > > >
> > > > Nomenclature and Defintions
> > > > ---------------------------
> > > >
> > > > * **dtype/type**: The data type of an array or scalar: `float32`,
> > > > `float64`, `int8`, …
> > > >
> > > > * **Category**: A category to which the data type belongs, in
> > > > this
> > > > context these are:
> > > > 1. boolean
> > > > 2. integer (unsigned and signed are not split up here, but are
> > > > different "kinds")
> > > > 3. floating point and complex (not split up here but are
> > > > different
> > > > "kinds")
> > > > 5. All others
> > > >
> > > > * **Casting**: converting from one dtype to another. There are
> > > > four
> > > > different rules of casting:
> > > > 1. *"safe"* casting: All values are representable in the new
> > > > data
> > > > type. I.e. no information is lost during the conversion.
> > > > 2. *"same kind"* casting: data loss may occur, but only within
> > > > the
> > > > same "kind". For example a float64 can be converted to float32
> > > > using
> > > > "same kind" rules, an int64 can be converted to int16. This is
> > > > although
> > > > both lose precision or even produce incorrect values. Note that
> > > > "kind"
> > > > is different from "category" in that it distinguishes between
> > > > signed
> > > > and unsigned integers.
> > > > 4. *"unsafe"* casting: Any conversion which can be defined,
> > > > e.g.
> > > > floating point to integer. For promotion this is fairly
> > > > unimportant.
> > > > (Some conversions such as string to integer, which not even work
> > > > fall
> > > > in this category, but could also be called coercions or
> > > > conversions.)
> > > >
> > > > * **Promotion**: The general process of finding a new dtype for
> > > > multiple input dtypes. Will be used here to also denote any kind
> > > > of
> > > > casting/promotion done before a specific function is called. This
> > > > can
> > > > be more complex, because in rare cases a functions can for
> > > > example
> > > > take
> > > > floating point numbers and integers as input at the same time
> > > > (i.e.
> > > > `np.ldexp`).
> > > >
> > > > * **Common dtype**: A dtype which can represent all input data.
> > > > In
> > > > general this means that all inputs can be safely cast to this
> > > > dtype.
> > > > Within numpy this is the normal and simplest form of promotion.
> > > >
> > > > * **`type1, type2 -> type3`**: Defines a promotion or signature.
> > > > For
> > > > example adding two integers: `np.int32(5) + np.int32(3)` gives
> > > > `np.int32(8)`. The dtype signature for that example would be:
> > > > `int32,
> > > > int32 -> int32`. A short form for this is also `ii->i` using C-
> > > > like
> > > > type codes, this can be found for example in `np.ldexp.types`
> > > > (and
> > > > any
> > > > numpy ufunc).
> > > >
> > > > * **Scalar**: A numpy or python scalar or a **0-D array**. It is
> > > > important to remember that zero dimensional arrays are treated
> > > > just
> > > > like scalars with respect to casting and promotion.
> > > >
> > > >
> > > > Current Situation in Numpy
> > > > --------------------------
> > > >
> > > > The current situation can be understand mostly in terms of safe
> > > > casting
> > > > which is defined based on the type hierarchy and is sensitive to
> > > > values
> > > > for scalars.
> > > >
> > > > This safe casting based approach is in contrast for example to
> > > > promotion within C or Julia, which work based on category first.
> > > > For
> > > > example `int32` cannot be safely cast to `float32`, but C or
> > > > Julia
> > > > will
> > > > use `int32, float32 -> float32` as the common type/promotion rule
> > > > for
> > > > example to decide on the output dtype for addition.
> > > >
> > > >
> > > > ### Python Integers and Floats
> > > >
> > > > Note that python integers are handled exactly like numpy ones.
> > > > They
> > > > are, however, special in that they do not have a dtype associated
> > > > with
> > > > them explicitly. Value based logic, as described here, seems
> > > > useful
> > > > for
> > > > python integers and floats to allow:
> > > > ```
> > > > arr = np.arange(10, dtype=np.int8)
> > > > arr += 1
> > > > # or:
> > > > res = arr + 1
> > > > res.dtype == np.int8
> > > > ```
> > > > which ensures that no upcast (for example with higher memory
> > > > usage)
> > > > occurs.
> > > >
> > > >
> > > > ### Safe Casting
> > > >
> > > > Most safe casting is clearly defined based on whether or not any
> > > > possible value is representable in the ouput dtype. Within numpy
> > > > there
> > > > is currently a single exception to this rule:
> > > > `np.can_cast(np.int64,
> > > > np.float64, casting="safe")` is considered to be true although
> > > > float64
> > > > cannot represent some large integer values exactly. In contrast,
> > > > `np.can_cast(np.int32, np.float32, casting="safe")` is `False`
> > > > and
> > > > `np.float64` would have to be used if a "safe" cast is desired.
> > > >
> > > > This exception may be one thing that should be changed, however,
> > > > concurrently the promotion rules have to be adapted to keep doing
> > > > the
> > > > same thing, or a larger behaviour change decided.
> > > >
> > > >
> > > > #### Scalar based rules
> > > >
> > > > Unlike arrays, where inspection of all values is not feasable,
> > > > for
> > > > scalars (and 0-D arrays) the value is inspected. The casting
> > > > becomes a
> > > > two step process:
> > > > 1. The minimal dtype capable of holding the value is found.
> > > > 2. The normal casting rules are applied to the new dtype.
> > > >
> > > > The first step uses the following rules by finding the minimal
> > > > dtype
> > > > within its category:
> > > >
> > > > * Boolean: Dtype is already minimal
> > > >
> > > > * Integers:
> > > > Casting is possible if output can hold the value. This
> > > > includes
> > > > uint8(127) casting to an int8.
> > > >
> > > > * Floats and Complex
> > > > Scalars can be demoted based on value, roughly this avoids
> > > > overflows:
> > > > ```
> > > > float16: -65000 < value < 65000
> > > > float32: -3.4e38 < value < 3.4e38
> > > > float64: -1.7e308 < value < 1.7e308
> > > > float128 (largest type, does not apply).
> > > > ```
> > > > For complex, the logic is simply applied to both real and
> > > > imaginary
> > > > part. Complex numbers cannot be downcast to floating point.
> > > >
> > > > * Others: Dtype is not modified.
> > > >
> > > >
> > > > This two step process means that `np.can_cast(np.int16(1024),
> > > > np.float16)` is `False` even though float16 is capable of exactly
> > > > representing the value 1024, since value based "demotion" to a
> > > > lower
> > > > dtype is used only within each category.
> > > >
> > > >
> > > >
> > > > ### Common Type Promotion
> > > >
> > > > For most operations in numpy the output type is just the common
> > > > type of
> > > > the inputs, this holds for example for concatenation, as well as
> > > > almost
> > > > all math funcions (e.g. addition and multiplication have two
> > > > identical
> > > > inputs and need one ouput dtype). This operation is exposed as
> > > > `np.result_type` which includes value based logic, and
> > > > `np.promote_types` which only accepts dtypes as input.
> > > >
> > > > Normal type promotion without value based/scalar logic finds the
> > > > smallest type which both inputs can cast to safely. This will be
> > > > the
> > > > largest "kind" (bool < unsigned < integer < float < complex <
> > > > other).
> > > >
> > > > Note that type promotion is handled in a "reduce" manner from
> > > > left
> > > > to
> > > > right. In rare cases this means it is not associatetive:
> > > > `float32,
> > > > uint16, int16 -> float32`, but `float32, (uint16, int16) ->
> > > > float64`.
> > > >
> > > > #### Scalar based rule
> > > >
> > > > When there is a mix of scalars and arrays, numpy will usually
> > > > allow
> > > > the
> > > > scalars to be handled in the same fashion as for "safe" casting
> > > > rules.
> > > >
> > > > The rules are as follows:
> > > >
> > > > 1. Value based logic is only applied if the "category" of any
> > > > array
> > > > is
> > > > larger or equal to the category of all scalars. If this is not
> > > > the
> > > > case, the typical rules are used.
> > > > * Specifically, this means: `np.array([1, 2, 3],
> > > > dtype=np.uint8) +
> > > > np.float64(12.)` gives a `float64` result, because the
> > > > `np.float64(12.)` is not considered for being demoted.
> > > >
> > > > 2. Promotion is applied as normally, however, instead of the
> > > > original
> > > > dtype, the minimal dtype is used. In the case where the minimal
> > > > data
> > > > type is unsigned (say uint8) but the value is small enough, the
> > > > minimal
> > > > type may in fact be either `uint8` or `int8` (127 can be both).
> > > > This
> > > > promotion is also applied in pairs (reduction-like) from left to
> > > > right.
> > > >
> > > >
> > > > ### General Promotion during Function Execution
> > > >
> > > > General functions (read "ufuncs" such as `np.add`) may have a
> > > > specific
> > > > dtype signature which is (for most dtypes) stored e.g. as
> > > > `np.add.types`. For many of these functions the common type
> > > > promotion
> > > > is used unchanged.
> > > >
> > > > However, some functions will employ a slightly different method
> > > > (which
> > > > should be equivalent in most cases). They will loop through all
> > > > loops
> > > > listed in `np.add.types` in order and find the first one to which
> > > > all
> > > > inputs can be safely cast:
> > > > ```
> > > > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...]
> > > > ```
> > > > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the
> > > > first
> > > > `float16, float16 -> float16` (`'ee->e'`) loop because `int16`
> > > > cannot
> > > > be cast safely, and then pick the float32 (`'ff->f'`) one.
> > > >
> > > > For simple functions, which commonly have two identical inputs,
> > > > this
> > > > should be identical, since normally a clear order exists for the
> > > > dtypes
> > > > (it does require checking int8 before uint8, etc.).
> > > >
> > > > #### Scalar based rule
> > > >
> > > > When scalars are involved, the "safe" cast logic based on values
> > > > is
> > > > applied *if and only if* rule 1. applies as before: That is there
> > > > must
> > > > be an array with a higher or equal category as all of the
> > > > scalars.
> > > >
> > > > In the above `np.divide` example, this means that
> > > > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will*
> > > > use
> > > > the `'ee->e'` loop, because the scalar `4` is of a lower or equal
> > > > category than the array (integer <= float or complex). While
> > > > checking,
> > > > 4 is found to be safely castable to float16, since `(u)int8` is
> > > > sufficient to hold 4 and that can be safely cast to `float16`.
> > > > However, `np.divide(np.int16(4), np.int16(3))` would use
> > > > `float32`
> > > > because both are scalars and thus value based logic is not used
> > > > (Note
> > > > that in reality numpy forces double output for an all integer
> > > > input
> > > > in
> > > > divide).
> > > >
> > > > In it is possible for ufuncs to have mixed type signatures (this
> > > > is
> > > > very rare within numy) and arbitrary inputs. In this case, in
> > > > principle, the question is whether or not a clear ordering exists
> > > > and
> > > > if the rule of using value based logic is always clear. This is
> > > > rather
> > > > academical (I could not find any such function in numpy or
> > > > `scipy.special` [^scipy-ufuncs]). But consider:
> > > > ```
> > > > imaginary_ufunc.types:
> > > > int32, float32 -> int32, float32
> > > > int64, float32 -> int64, float32
> > > > ...
> > > > ```
> > > > it is not clear that `np.int64(5) + np.float32(3.)` should be
> > > > able
> > > > to
> > > > demote the `5`. This is very theoretical of course
> > > >
> > > >
> > > >
> > > >
> > > > Footnotes
> > > > ---------
> > > >
> > > > [^scipy-ufuncs]: See for example these functions:
> > > > ```python
> > > > import scipy.special
> > > > for n, func in scipy.special.__dict__.items():
> > > > if not isinstance(func, np.ufunc):
> > > > continue
> > > >
> > > > if func.nin == 1:
> > > > # a single input is not interesting
> > > > continue
> > > >
> > > > # check if the signature is not uniform
> > > > for types in func.types:
> > > > if len(set(types[:func.nin])) != 1:
> > > > break
> > > > else:
> > > > continue
> > > > print(func, func.types)
> > > > ```
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/4813dae2/attachment-0001.html>