looking for "optimal weighting" algorithm

Fri Apr 11 14:40:23 EDT 2003

On Thu, 10 Apr 2003 15:13:11 +0200, Alex Martelli <aleaxit at yahoo.com> wrote:

>I _know_ there's a decent algorithm to solve the following problem,
>but I don't recall its name and thus can't rapidly google for the 
>details... can somebody help?
>
>
>The problem: I need to design a "decision criterion" to classify
>"observations".  For each observation I measure the values of a
>number N of features, x1, x2, ... xN; the desired design criterion
>is a set of weights w1, w2, ... wN such that for any observation I 
>will then just compute a weighted sum
>  S = w1*x1 + w2*x2 + ... + wN*xN
>and classify the observation as Black if S<=1, White if S>1.  To
>train my classifier I have a large corpus of observations already
>made and annotated with "ground-truth" data about whether each
>given observation should have been classified as B or W, and an
>error-cost value for each kind of classification (Ebw is the cost
>of erroneously classifying a feature as W when it should be B,
>Ewb is that of classifying it as B when it should be W).  So,
>what's the algorithm to estimate the weights given the corpus of
>observation and ground-truth data, and the error-costs?
>
My first question(s) would be about the "observations": E.g., are
they orthogonal/independent? Second, what kinds of distributions do
they have? What kinds of joint distributions do they have?

Is a simple linear combination really appropriate (which is like
a projection of a multi-dimensional point onto a line which you bisect
to make the decision)? What if your multi-dimensional points are really
in a bunch of clusters that should be labeled with ground-truth? I.e, if
there is no projection that doesn't overlap clusters with conflicting
ground-truth, so that you have to evaluate new data by figuring out what
cluster new data belongs to and then look up your result in a dictionary
of clusters identified and labeled according to your training data.

If there is no knowledge other than what is implicit in your training set,
what implicit assumptions are being made? Are there measurement error
distributions as well as distributions of perturbations of the measured
things themselves (e.g., digital roundoff plus shaky hands plus measurement
of water depth in a pool with some waves) and is that worth thinking about
vs just making a gaussian or boxcar assumption?

Are the assumptions explicit? Is there knowledge about constraints on relationships
between observations (i.e., if they are not totally independent) that could help?
E.g., absolute mutual exclusions of concurrent existence of valid measurements
(which would suggest partitioning the problem algorithmically before applying
perhaps different continuous math models) as opposed to cross correlations etc.,
or coding relationships such as between bits of an error correcting code?

Well, that was my reaction, FWIW. I'm sure I'm not the only one curious
as to what you are measuring and classifying ;-)

Regards,
Bengt Richter