
Hello, I wonder what are the alternatives to efficiently represent 128-bit integers? I have found a GitHub issue that proposed a support for int128 data type - https://github.com/numpy/numpy/issues/9992 - but it is closed now, and I was wondering what are the options meanwhile. The context of my question: in the end, I am looking for an efficient way to work UUID values in Pandas (sorry, it goes down to that). Since I am not doing computations on this data, but only use it as identifiers, I am mostly looking into the efficient memory usage, I guess (unless I can get a performance hit for some other related reasons too?). With Python's regular integral number type, 128-bit integers use 44 bytes (according to the `sys.getsizeof'). Interestingly, a 16-byte array in Python uses 49 bytes, so I'm still better off with an integer. I understand that even with a theoretical np.int128 I'd get 40 bytes (24 + 16), but hey, it is less than 44 anyway. I could artificially split 128-integers into two int64 numbers (high 64 bits and low 64 bits) and consider the set of these two as a single value for all meaningful usages, but this is quite inconvenient and actually will use more space in the end. So I wonder are there other options? From the GitHub issue I understood that NumPy internally uses 128-bit integers for some purpose, where can I find out more about it? Thank you!

You could use a composite data type of two int64s https://numpy.org/doc/stable/user/basics.rec.html. It would not work with arithmetic, but you said you don't care about that. You'd just need one helper function to convert the data to a UUID string. Aaron Meurer On Fri, Feb 3, 2023 at 6:21 AM Tim Candid <timcandid@gmail.com> wrote:
Hello,
I wonder what are the alternatives to efficiently represent 128-bit integers? I have found a GitHub issue that proposed a support for int128 data type - https://github.com/numpy/numpy/issues/9992 - but it is closed now, and I was wondering what are the options meanwhile.
The context of my question: in the end, I am looking for an efficient way to work UUID values in Pandas (sorry, it goes down to that). Since I am not doing computations on this data, but only use it as identifiers, I am mostly looking into the efficient memory usage, I guess (unless I can get a performance hit for some other related reasons too?).
With Python's regular integral number type, 128-bit integers use 44 bytes (according to the `sys.getsizeof'). Interestingly, a 16-byte array in Python uses 49 bytes, so I'm still better off with an integer. I understand that even with a theoretical np.int128 I'd get 40 bytes (24 + 16), but hey, it is less than 44 anyway. I could artificially split 128-integers into two int64 numbers (high 64 bits and low 64 bits) and consider the set of these two as a single value for all meaningful usages, but this is quite inconvenient and actually will use more space in the end.
So I wonder are there other options? From the GitHub issue I understood that NumPy internally uses 128-bit integers for some purpose, where can I find out more about it?
Thank you! _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: asmeurer@gmail.com
participants (2)
-
Aaron Meurer
-
Tim Candid