getting a submatrix of all true

Terry Reedy tjreedy at udel.edu
Thu Jul 3 23:14:04 CEST 2003


"John Hunter" <jdhunter at ace.bsd.uchicago.edu> wrote in message
news:mailman.1057250709.8853.python-list at python.org...
> >>>>> "Terry" == Terry Reedy <tjreedy at udel.edu> writes:
>
>
>     Terry> Statisticians have tried a variety of approaches.
Googling
>     Terry> for ' statistics "missing data" 'will give you some leads
>     Terry> if you want.
>
> I have done some searching.  I'm familiar with the common methods
> (delete every row that contains any missing, replace missing via
mean
> or regression or something clever) but haven't seen any discussion
of
> dropping variables and observations together to yield data sets with
> no missing values.  Have you seen something like this?

There are also calculation methods for at least some analyses that
allow for missing data .  One of the google hits is for the book
Statistical Analysis with Missing Data.  I have not seen it, but it is
not unique.

As I hinted, there are no really nice solutions to missing data.  I
have done both row and column deletion.  Sometimes I have done
multiple analyses with different deletion strategies: once with enough
vars deleted so all or most cases are complete, and again with enough
cases deleted so that all or most var are complete.

I would start by counting (with a program) the number of missing
values for each row and then construction the frequency distribution
thereof.  Then the same for the columns, with the addition of a
correlation table or tables.

One thing one can do with vars is to combine some to make a composite
measure.  For instance, if three variables more-or-less measure the
same thing, one can combine (perhaps by the mean of those present) to
make one variable that is present if any of the three are, so it is
only missing for cases (rows) that are missing all three.  This type
of work requires that you look at the variables and consider their
meaning, rather than just inputing them into a blind proceedure that
consisders all vars to be the same.

Terry J. Reedy






More information about the Python-list mailing list