Re: New feature: binary (arbitrary base) rounding

On 8 Nov 2022, at 15:32, numpy-discussion-request@python.org<mailto:numpy-discussion-request@python.org> wrote: Thanks for the proposal. I don't have much of an opinion on this and right now I am mainly wondering whether there is prior art which can inform us that this is relatively widely useful? Base-2 (bit) rounding is implemented in Numcodecs <https://numcodecs.readthedocs.io/en/stable/bitround.html#module-numcodecs.bi...> in the context of data compression. As pointed out by M. Klöwer et al. in <https://doi.org/10.1038/s43588-021-00156-2> "Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself." I'm not an expert, but I never encountered rounding floating point numbers in bases different from 2 and 10. Stefano

I'm not an expert, but I never encountered rounding floating point numbers in bases different from 2 and 10.
I agree that this is probably not very common. More a possibility if one would supply a base argument to around. However, it is worth noting that Matlab has the quant function, https://www.mathworks.com/help/deeplearning/ref/quant.html which basically supports arbitrary bases (as a special case of an even more general approach). So there may be other use cases (although the example basically just implements around(x, 1)). BR Oscar Gustafsson

On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote:
To be honest, hearing hardware design and data compression does make me lean towards it not being mainstream enough that inclusion in NumPy really makes sense. But happy to hear opposing opinions. It would be nice to have more of a culture around ufuncs that do not live in NumPy. (I suppose at some point it was more difficult to do C- extension, but that is many years ago). - Sebastian

Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg < sebastian@sipsolutions.net>:
Here I can easily argue that "all" computations are limited by finite word length and as soon as you want to see the effect of any type of format not supported out of the box, it will be beneficial. (Strictly, it makes more sense to quantize to a given number of bits than a given number of decimal digits, as we cannot represent most of those exactly.) But I may not do that.
I do agree with this though. And this got me realizing that maybe what I actually would like to do is to create an array-library with fully customizable (numeric) data types instead. That is, sort of, the proper way to do it, although the proposed approach is indeed simpler and in most cases will work well enough. (Am I right in believing that it is not that easy to piggy-back custom data types onto NumPy arrays? Something different from using object as dtype or the "struct-like" custom approach using the existing scalar types.) BR Oscar Gustafsson

On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote:
NumPy is pretty much fully customizeable (beyond just numeric data types). Admittedly, to not have weird edge cases and have more power you have to use the new API (NEP 41-43 [1]) and that is "experimental" and may have some holes. "Experimental" doesn't mean it is expected to change significantly, just that you can't ship your stuff broadly really. The holes may matter for some complicated dtypes (custom memory allocation, parametric...). But at this point many should be rather fixable, so before you do your own give NumPy a chance? - Sebastian [1] https://numpy.org/neps/nep-0041-improved-dtype-support.html

Thanks! That does indeed look like a promising approach! And for sure it would be better to avoid having to reimplement the whole array-part and only focus on the data types. (If successful, my idea of a project would basically solve all the custom numerical types discussed, bfloat16, int2, int4 etc.) I understand that the following is probably a hard question to answer, but is it expected that there will be work done on this in the "near" future to fill any holes and possibly become more stable? For context, the current plan on my side is to propose this as a student project for the spring, so primarily asking for planning and describing the project a bit better. BR Oscar Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg < sebastian@sipsolutions.net>:

On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote:
OK, more below. But unfortunately `int2` and `int4` *are* problematic, because the NumPy array uses a byte-sized strided layout, so you would have to store them in a full byte, which is probably not what you want. I am always thinking of adding a provision for it in the DTypes so that someone could use part of the NumPy machine to make an array that can have non-byte sized strides, but the NumPy array itself is ABI incompatible with storing these packed :(. (I.e. we could plug that "hole" to allow making an int4 DType in NumPy, but it would still have to take 1-byte storage space when put into a NumPy array, so I am not sure there is much of a point.)
Well, it depends on what you need. With the exception above, I doubt the "holes" will matter much practice unless you are targeting for a polished release rather than experimentation. But of course it may be that you run into something that is important for you, but doesn't yet quite work. I will note just dealing with the Python/NumPy C-API can be a fairly steep learning curve, so you need someone comfortable to dive in and budget a good amount of time for that part. And yes, this is pretty new, so there may be stumbling stones (which I am happy to discuss in NumPy issues or directly). - Sebastian

(I.e. we could plug that "hole" to allow making an int4 DType in NumPy,
but it would still have to take 1-byte storage space when put into a NumPy array, so I am not sure there is much of a point.)
I have also been curious about the new DTypes mechanism and whether we could do non byte-size DTypes with it. One use-case I have specifically is for reading and writing non byte-aligned data [1]. So, this would work very well for that use-case if the dtype knew how to read/write the proper bit-size. For my use-case I wouldn't care too much if internally Numpy needs to expand and store the data as full bytes, but being able to read a bitwise binary stream into Numpy native dtypes for further processing would be useful I think (without having to resort to unpackbits and do rearranging/packing to other types). dtype = {'names': ('count0', 'count1'), 'formats': ('uint3', 'uint5')} # x would have two unsigned ints, but reading only one byte from the stream x = np.frombuffer(buffer, dtype) # would be ideal to get tobytes() to know how to pack a uint3+uint5 DType into a single byte as well x.tobytes() Greg [1] Specifically, this is for very low bandwidth satellite data where we try to pack as much information in the downlink and use every bit of space, but once on the ground I can expand the bit-size fields to byte-size fields without too much issue of worrying about space [puns intended]. On Fri, Nov 11, 2022 at 7:14 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:

On Fri, 2022-11-11 at 09:13 -0700, Greg Lucas wrote:
Unfortunately, I suspect the amount of expectations users would have from a full DType, and the fact that bit-sized will be a bit awkward in NumPy arrays for the forseeable future makes me think dedicated conversion functions are probably more practical. Yes, you could do a `MyInt(bits=5, offset=3)` DType and at least you could view the same array also with `MyInt(bits=3, offset=0)`. (Maybe also structured DType, but I am not certain that is advisable and custom structured DTypes would require holes to be plucked). A custom dtype that is "structured" might work (i.e. you could store two numbers in one byte of course). Currently you cannot integrate deep enough into NumPy to build structured dtypes based on arbitrary other dtypes, but you could do it for your own bit DType. (I am not quite sure you can make `arr["count0"]` work, this is a hole that needs plucking.) This is probably not a small task though. Could `tobytes()` be made to compactify? Yes, but then it suddenly needs extra logic for bit-sized and doesn't just expose memory. That is maybe fine, but also seems a bit awkward? I would love to have a better answer, but dancing around the byte- strided ABI seems tricky... Anyway, I am always available to discuss such possibilities, there are some corners w.r.t. to such bit-sized thoughts which are still shrouded in fog. - Sebastian

Hi all, an advantage of sub-byte datatypes is the potential for accelerated computing. For GPUs, int4 is already happening. Or take int1 for example: if one had two arrays of size 64, that would be eight bytes. Now, if one wanted to add those two arrays, one could simply xor them as a uint64 (or 8x uint8 xor). However, I would rather limit sub-bytetypes to int1, (u)int2 and (u)int4, as they are the only ones that divide the byte evenly (or to begin with at least). Considering single element access: a single element in such an array could be accessed by dividing the index, e.g. and ANDing with a mask. Probably uint8 would make sense for this. That would create some overhead of course, but the data is more compact (which is nice for CPU/GPU cache) and full-array ops are faster. Striding could be done similarly to single element access. This would be inefficient as well, but one could auto-generate some type specific C code (for int1, (u)int2, (u)int4 and their combinations) that accelerates popular operators. So one would not need to actually loop over every entry with single element access. „byte size strided“: isn‘t it possible to pre-process the strides and post-process the output as mentioned above? Like a wrapping class around a uint8 array. What do you think? Am I missing out on something? Best, Michael

I'm not an expert, but I never encountered rounding floating point numbers in bases different from 2 and 10.
I agree that this is probably not very common. More a possibility if one would supply a base argument to around. However, it is worth noting that Matlab has the quant function, https://www.mathworks.com/help/deeplearning/ref/quant.html which basically supports arbitrary bases (as a special case of an even more general approach). So there may be other use cases (although the example basically just implements around(x, 1)). BR Oscar Gustafsson

On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote:
To be honest, hearing hardware design and data compression does make me lean towards it not being mainstream enough that inclusion in NumPy really makes sense. But happy to hear opposing opinions. It would be nice to have more of a culture around ufuncs that do not live in NumPy. (I suppose at some point it was more difficult to do C- extension, but that is many years ago). - Sebastian

Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg < sebastian@sipsolutions.net>:
Here I can easily argue that "all" computations are limited by finite word length and as soon as you want to see the effect of any type of format not supported out of the box, it will be beneficial. (Strictly, it makes more sense to quantize to a given number of bits than a given number of decimal digits, as we cannot represent most of those exactly.) But I may not do that.
I do agree with this though. And this got me realizing that maybe what I actually would like to do is to create an array-library with fully customizable (numeric) data types instead. That is, sort of, the proper way to do it, although the proposed approach is indeed simpler and in most cases will work well enough. (Am I right in believing that it is not that easy to piggy-back custom data types onto NumPy arrays? Something different from using object as dtype or the "struct-like" custom approach using the existing scalar types.) BR Oscar Gustafsson

On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote:
NumPy is pretty much fully customizeable (beyond just numeric data types). Admittedly, to not have weird edge cases and have more power you have to use the new API (NEP 41-43 [1]) and that is "experimental" and may have some holes. "Experimental" doesn't mean it is expected to change significantly, just that you can't ship your stuff broadly really. The holes may matter for some complicated dtypes (custom memory allocation, parametric...). But at this point many should be rather fixable, so before you do your own give NumPy a chance? - Sebastian [1] https://numpy.org/neps/nep-0041-improved-dtype-support.html

Thanks! That does indeed look like a promising approach! And for sure it would be better to avoid having to reimplement the whole array-part and only focus on the data types. (If successful, my idea of a project would basically solve all the custom numerical types discussed, bfloat16, int2, int4 etc.) I understand that the following is probably a hard question to answer, but is it expected that there will be work done on this in the "near" future to fill any holes and possibly become more stable? For context, the current plan on my side is to propose this as a student project for the spring, so primarily asking for planning and describing the project a bit better. BR Oscar Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg < sebastian@sipsolutions.net>:

On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote:
OK, more below. But unfortunately `int2` and `int4` *are* problematic, because the NumPy array uses a byte-sized strided layout, so you would have to store them in a full byte, which is probably not what you want. I am always thinking of adding a provision for it in the DTypes so that someone could use part of the NumPy machine to make an array that can have non-byte sized strides, but the NumPy array itself is ABI incompatible with storing these packed :(. (I.e. we could plug that "hole" to allow making an int4 DType in NumPy, but it would still have to take 1-byte storage space when put into a NumPy array, so I am not sure there is much of a point.)
Well, it depends on what you need. With the exception above, I doubt the "holes" will matter much practice unless you are targeting for a polished release rather than experimentation. But of course it may be that you run into something that is important for you, but doesn't yet quite work. I will note just dealing with the Python/NumPy C-API can be a fairly steep learning curve, so you need someone comfortable to dive in and budget a good amount of time for that part. And yes, this is pretty new, so there may be stumbling stones (which I am happy to discuss in NumPy issues or directly). - Sebastian

(I.e. we could plug that "hole" to allow making an int4 DType in NumPy,
but it would still have to take 1-byte storage space when put into a NumPy array, so I am not sure there is much of a point.)
I have also been curious about the new DTypes mechanism and whether we could do non byte-size DTypes with it. One use-case I have specifically is for reading and writing non byte-aligned data [1]. So, this would work very well for that use-case if the dtype knew how to read/write the proper bit-size. For my use-case I wouldn't care too much if internally Numpy needs to expand and store the data as full bytes, but being able to read a bitwise binary stream into Numpy native dtypes for further processing would be useful I think (without having to resort to unpackbits and do rearranging/packing to other types). dtype = {'names': ('count0', 'count1'), 'formats': ('uint3', 'uint5')} # x would have two unsigned ints, but reading only one byte from the stream x = np.frombuffer(buffer, dtype) # would be ideal to get tobytes() to know how to pack a uint3+uint5 DType into a single byte as well x.tobytes() Greg [1] Specifically, this is for very low bandwidth satellite data where we try to pack as much information in the downlink and use every bit of space, but once on the ground I can expand the bit-size fields to byte-size fields without too much issue of worrying about space [puns intended]. On Fri, Nov 11, 2022 at 7:14 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:

On Fri, 2022-11-11 at 09:13 -0700, Greg Lucas wrote:
Unfortunately, I suspect the amount of expectations users would have from a full DType, and the fact that bit-sized will be a bit awkward in NumPy arrays for the forseeable future makes me think dedicated conversion functions are probably more practical. Yes, you could do a `MyInt(bits=5, offset=3)` DType and at least you could view the same array also with `MyInt(bits=3, offset=0)`. (Maybe also structured DType, but I am not certain that is advisable and custom structured DTypes would require holes to be plucked). A custom dtype that is "structured" might work (i.e. you could store two numbers in one byte of course). Currently you cannot integrate deep enough into NumPy to build structured dtypes based on arbitrary other dtypes, but you could do it for your own bit DType. (I am not quite sure you can make `arr["count0"]` work, this is a hole that needs plucking.) This is probably not a small task though. Could `tobytes()` be made to compactify? Yes, but then it suddenly needs extra logic for bit-sized and doesn't just expose memory. That is maybe fine, but also seems a bit awkward? I would love to have a better answer, but dancing around the byte- strided ABI seems tricky... Anyway, I am always available to discuss such possibilities, there are some corners w.r.t. to such bit-sized thoughts which are still shrouded in fog. - Sebastian

Hi all, an advantage of sub-byte datatypes is the potential for accelerated computing. For GPUs, int4 is already happening. Or take int1 for example: if one had two arrays of size 64, that would be eight bytes. Now, if one wanted to add those two arrays, one could simply xor them as a uint64 (or 8x uint8 xor). However, I would rather limit sub-bytetypes to int1, (u)int2 and (u)int4, as they are the only ones that divide the byte evenly (or to begin with at least). Considering single element access: a single element in such an array could be accessed by dividing the index, e.g. and ANDing with a mask. Probably uint8 would make sense for this. That would create some overhead of course, but the data is more compact (which is nice for CPU/GPU cache) and full-array ops are faster. Striding could be done similarly to single element access. This would be inefficient as well, but one could auto-generate some type specific C code (for int1, (u)int2, (u)int4 and their combinations) that accelerates popular operators. So one would not need to actually loop over every entry with single element access. „byte size strided“: isn‘t it possible to pre-process the strides and post-process the output as mentioned above? Like a wrapping class around a uint8 array. What do you think? Am I missing out on something? Best, Michael
participants (5)
-
Greg Lucas
-
Michael Siebert
-
Oscar Gustafsson
-
Sebastian Berg
-
Stefano Miccoli