[Tutor] reading variables in a data set?

Emile van Sebille emile at fenx.com
Sun Jul 5 02:39:27 CEST 2009

On 7/4/2009 9:09 AM Steven Buck said...
> Dear Python Tutor,
> I'm doing econometric work and am a new user of Python. I have read 
> several of the tutorials, but haven't found them useful for a newbie 
> problem I've encountered.
> I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to 
> get a Stata ".dta" file into Python. In Stata the data set is an NXK 
> matrix where N is the number of observations (households) and K is the 
> number of variables. 
> I gather it's now a list where each element of the list is an 
> observation (a vector) for one household.  The name of my list is 
> "data"; I gather Python recognizes the first observation by: data[1] . 
> Example,
> data = [X_1, X_2, X_3, . . . . , X_N]  where each X_i for all i, is 
> vector of household characteristics, eg X_1 = (age_1, wage_1, . . . , 
> residence_1).
> I also have a list for variable names called "varname"; although I'm not 
> sure the module I used to extract the ".dta" into Python also created a 
> correspondence between the varname list and the data list--the python 
> interpreter won't print anything when I type one of the variable names, 
> I was hoping it would print out a vector of ages or the like. 

Assuming you're working in the python console somewhat from the example 
on the source website for PyDTA:

from PyDTA import Reader
dta = Reader(file('input.dta'))
fields = ','.join(['%s']*len(dta.variables()))

... you might try starting at dir|help (dta.variables)

I didn't look, but the sources are available as well.

> In anycase, I'd like to make a scatter plot in pylab, 

I think I'd use dictionaries along these lines:

   wages = { age_1: [ X_1, X_15, X_3...],
             age_2: [ X_2, X_5... ],

> but don't know how 
> to  identify a variable in "data" (i.e.  I'd like a vector listing the 
> ages and another vector listing the wages of  households).  

I think poking into dta.variables will answer this one.



> Perhaps, I 
> need to run subroutine to collect each relevant data point to create a 
> new list which I define as my variable of interest?  From the above 
> example, I'd like to create a list such as: age = [age_1, age_2, . . . , 
> age_N] and likewise for wages.
> Any help you could offer would be very much appreciated.  Also, this is 
> my first time using the python tutor, so let me know if I've used it 
> appropriately or if I should change/narrow the structure of my question.
> Thanks
> Steve
> -- 
> Steven Buck
> Ph.D. Student
> Department of Agricultural and Resource Economics
> University of California, Berkeley
> ------------------------------------------------------------------------
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

More information about the Tutor mailing list