
Dear all, There is a PR that adds a lookup table approach to `unique`, shown below. You can get up to ~16x speedup for large integer arrays, at the cost of potentially greater memory usage. https://github.com/numpy/numpy/pull/21843 This is controlled by a new `kind` parameter, which is described below. The current approach uses "sort" while the proposed change implements the "table" method. The default option None will automatically select "table" when possible and memory usage is not too large. ``` kind : {None, 'sort', 'table'}, optional The algorithm to use. This will not affect the final result, but will affect the speed and memory use. The default, None, will select automatically based on memory considerations. * If 'sort', will use a mergesort-based approach. * If 'table', will use a lookup table approach similar to a counting sort. This is only available for boolean and integer arrays. This will have a memory usage of the size of `ar` plus the max-min value of `ar`. The options `return_index`, `return_inverse`, `axis`, and `equal_nan` are unavailable with this option. * If None, will automatically choose 'table' if possible, and the required memory allocation is less than or equal to 6 times the size of `ar`. Will otherwise will use 'sort'. This is done to not use a large amount of memory by default, even though 'table' may be faster in most cases. ``` The method and API are very similar to that merged last week for `isin`: https://github.com/numpy/numpy/pull/12065/. One difference is that `return_counts` required a slightly modified approach–using `bincount` seems to work well for this. I am eager to hear your comments on this new PR. Thanks! Miles