looking for "optimal weighting" algorithm

Duncan Smith buzzard at urubu.freeserve.co.uk
Fri Apr 11 01:57:14 CEST 2003


"Terry Reedy" <tjreedy at udel.edu> wrote in message
news:8c2dnXyBdvemeQijXTWcqw at comcast.com...
>
> "Duncan Smith" <buzzard at urubu.freeserve.co.uk> wrote in message
> news:b74ckl$itb$1 at newsg4.svr.pol.co.uk...
> > Yes, you do have the link function as well as the linear predictor,
> and the
> > cut-off would be 0.5 rather than 1.  But in terms of absolute
> efficiency
> > there is (according to the literature) little to choose between
> linear
> > discriminant analysis and logistic regression.  Changing the cut-off
> would
> > be a simple way of attempting to minimise Alex's 'error-cost'.
>
> For at least some types of data distributions, I can believe that.  My
> comment about stepwise construction of a feature set versus
> over-fitting with everything you can think of applies equally well to
> logistic regression.  As far as I know, logistic regression cannot be
> used for more than two mutually-exclusive groups.  I don't know
> whether Alex might ever need to.
>
> Terry J. Reedy
>

There is nominal logistic regression for handling more than two groups. I've
never used it personally (that I can remember), but it is available in eg.
Minitab.  There's no stepwise procedure in Minitab (other than the usual one
implemented by a sensible user, by examining the output and refitting).
Even so, I'd still advocate leaving some data out of the fitting process, so
that there are observations to validate the model against, then refitting
the chosen model's parameters using all the data at the end.  (Personally
I'd favour a jacknife procedure because it also helps to identify unusual /
influential observations.)  That said, a naive Bayes classifier would be
much easier to code, and probably have similar predictive performance (could
be jacknifed to avoid overfitting).  And, a simple cut off adjustment would
still be possible.

Duncan







More information about the Python-list mailing list