Just doing argsort() on the whole array is faster (up until about 1e6 elements)
because it does everything in C whereas heapq will create a lot of Python
objects because it is treating the array as a general Python container.



That's a good point. I wasn't thinking about the efficiency issue.