Just doing argsort() on the whole array is faster (up until about 1e6 elements) because it does everything in C whereas heapq will create a lot of Python
objects because it is treating the array as a general Python container.
That's a good point. I wasn't thinking about the efficiency issue.