Maximizing observations from Sparse matrices
PoulsenL at capanalysis.com
PoulsenL at capanalysis.com
Mon Jul 15 18:14:21 EDT 2002
I am running a regression that gathers its data from a panel dataset
(cross-sectional time series). Because the dataset is sparse I must
sometimes either drop cross-sections or variables or some combination of the
two to avoid a singular matrix. A quick, extremely simplified example:
obs region var1 var2 var3 Valid ob?
1 1 1 4 7 3
2 1 7 NA 9 0
3 1 7 NA 9 0
4 1 5 NA 7 0
5 1 7 7 5 3
1 2 7 NA 9 0
2 2 5 NA 6 0
3 2 7 4 5 3
4 2 9 8 NA 0
5 2 7 6 5 3
1 3 NA 58 4 0
2 3 NA 98 25 0
3 3 63 85 NA 0
4 3 74 NA 78 0
5 3 97 54 NA 0
1 4 NA 89 7 0
2 4 25 85 NA 0
3 4 5 NA 2 0
4 4 32 85 NA 0
5 4 45 12 3 3
Sum of valid obs 15
In this example I have 4 regions and 3 variables. Region 3 has no valid
observations and therefore cannot be utilized unless corrected. I have two
choices: eliminate a variable or eliminate the region.
If I eliminate the variable it will now run because I have valid obs across
all remaining regions:
obs region var1 var2 var3 Valid ob?
1 1 4 7 2
2 1 NA 9 0
3 1 NA 9 0
4 1 NA 7 0
5 1 7 5 2
1 2 NA 9 0
2 2 NA 6 0
3 2 4 5 2
4 2 8 NA 0
5 2 6 5 2
1 3 58 4 2
2 3 98 25 2
3 3 85 NA 0
4 3 NA 78 0
5 3 54 NA 0
1 4 89 7 2
2 4 85 NA 0
3 4 NA 2 0
4 4 85 NA 0
5 4 12 3 2
Sum of valid obs 16
Finally I can instead eliminate a region
obs region var1 var2 var3 Valid ob?
1 1 1 4 7 3
2 1 7 NA 9 0
3 1 7 NA 9 0
4 1 5 NA 7 0
5 1 7 7 5 3
1 2 7 NA 9 0
2 2 5 NA 6 0
3 2 7 4 5 3
4 2 9 8 NA 0
5 2 7 6 5 3
1 4 NA 89 7 0
2 4 25 85 NA 0
3 4 5 NA 2 0
4 4 32 85 NA 0
5 4 45 12 3 3
Sum of valid obs 15
This quick example yields the same number of observations, but there are
configurations where this may lead to more obs than eliminating a variable.
The problem is I have hundreds of variables and about 50 regions. Is there
an efficient way to maximize the number of observations?
Thanks for any help
Loren
More information about the Python-list
mailing list