First of all, I really love the docs of the C API :) It's way above what I would expect!

I was reviewing the signature possibilities for generalized UFuncs, and had a question

https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html

I am playing with a UFunc that scores and returns some top N, where N could be specified the user. IE the user might do

get_most_similar(X, y, n=10)

You can imagine situations where this could happen in similarity functions, where we want to get some Top N rows of X most similar to y. But sometimes users will want 10, or 100, or need to page through results etc. For performance reasons, I wouldn't want to maintain an index of every row of X, I'd prefer to only have to care about the top 10 or so.

I wonder what the best way to do this? 

One thought I had was always set the output dimension to 10 for now, and handle paging on the python side by perhaps also having an offset parameter for my function, to window into the similar results.

The second thought I had was to just get 100 instead of 10, as that probably is enough for most use cases. And users can slice out what they need. It's a little annoying in terms of perf cost, but probably not a big deal.

But it would be convenient to just let the user specify the N they want.

Thanks for any insights!
-Doug