Hello all, I was noticing that `np.triu_indices` took quite awhile and discovered it creates an upper triu array and then uses `np.where`. This seems quite inefficient and I was curious if something like the following would be better: """ def fast_triu_indices(dim,k=0): tmp_range = np.arange(dim-k) rows = np.repeat(tmp_range,(tmp_range+1)[::-1]) cols = np.ones(rows.shape[0],dtype=np.int) inds = np.cumsum(tmp_range[1:][::-1]+1) np.put(cols,inds,np.arange(dim*-1+2+k,1)) cols[0] = k np.cumsum(cols,out=cols) return (rows,cols) """ This is just a first run at the function, and unfortunately does not work for k<0. However, it does return the correct results for k>=0 and is between 2-8 faster on my machine then `np.triu_indices`. Any thoughts on this? -Daniel