[Numpy-discussion] calculating weighted majority using two 3D arrays
Gregory, Matthew
matt.gregory at oregonstate.edu
Thu Mar 6 13:37:02 EST 2008
Eads, Damian wrote:
> You may need to be a bit more specific by what you mean by
> weighted majority. What are the range of values for values
> and weights, specifically? This sounds a lot like pixel
> classification where each pixel is classified with a majority
> vote over its weights and values. Is that what you're trying to do?
>
> Many numpy functions (e.g. mean, max, min, sum) have an axis
> parameter, which specifies the axis along which the statistic
> is computed. Omitting the axis parameter causes the statistic
> to be computed over all values in the multidimensional array.
>
> Suppose the 'values' array contains floating point numbers in
> the range
> -1 to 1 and a larger absolute value gives a larger
> confidence. Also suppose the weights are floating point
> numbers between 0 and 1. The weighted majority vote for pixel
> i,j over 10 real-valued (confidenced) votes, each vote having
> a separate weight, is computed by
>
> w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum())
>
> This can be vectorized to give a weighted majority vote for
> each pixel by doing
>
> w_vote = numpy.sign((values*weights).sum(axis=0))
>
> The values*weights expression gives a weighted prediction.
> This also works if the 'values' are just predictions from the
> set {-1, 1}, i.e.
> there are ten classifiers, each one predicts either -1 and 1
> on each pixel.
Damian, thank you for the helpful response. I should have been a bit
more explicit about what I meant by weighted majority. In my case, I
need to find a discrete value (i.e. class) that occurs most often among
ten observations where weighting is pre-determined by an
inverse-distance calculation. Ignoring for a moment the
multidimensionality issue, my values and weights arrays might look like
this:
values = array([14, 32, 12, 50, 2, 8, 19, 12, 19, 10])
weights = array([0.5, 0.1, 0.6, 0.1, 0.8, 0.3, 0.8, 0.4, 0.9, 0.2])
My function to calculate the majority looks like this:
def weightedMajority(a, b):
# Put all the samples into a dictionary with weights summed for
# duplicate values
wDict = {}
for i in xrange(len(a)):
(value, weight) = (a[i], b[i])
if wDict.has_key(value):
wDict[value] += weight
else:
wDict[value] = weight
# Create arrays of the values and weights
values = numpy.array(wDict.keys())
weights = numpy.array(wDict.values())
# Return the index of the maximum value
index = numpy.argmax(weights)
# Return the majority value
return values[index]
In the above example:
>> maj = weightedMajority(values, weights)
>> maj
19
Correct me if I'm wrong, but I don't think that your example will work
when I am looking to return a discrete value from the values set, but
you may see something that I'm doing that is truly inefficient!
thanks, matt
More information about the NumPy-Discussion
mailing list