As a converting MATLAB user I am bringing over some of my code. A core function I need is to sample the probability distribution of a given set of data. Sometimes these sets are large so I would like to make it as efficient as possible. (the data are integers representing members of a discrete space)

In MATLAB the best way I found was the "diff of find of diff" trick which resulted in the completely vectorised solution (below). Does it make sense to translate this into numpy? I don't have a feel yet for what is fast/slow - are the functions below built in and so quick (I thought diff might not be).

Is there a better/more pythonic way to do it?

--------

function Pr=prob(data, nR)

Pr = zeros(nR,1);

% diff of find of diff trick for counting number of elements

temp = sort(data(data>0)); % want vector excluding P(0)

dtemp = diff([temp;max(temp)+1]);

count = diff(find([1;dtemp]));

indx = temp(dtemp>0);

Pr(indx)= count ./ numel(data); % probability

--------

Thanks

Robin