# [Tutor] reading variables in a data set?

Emile van Sebille emile at fenx.com
Sun Jul 5 02:39:27 CEST 2009

```On 7/4/2009 9:09 AM Steven Buck said...
> Dear Python Tutor,
> I'm doing econometric work and am a new user of Python. I have read
> several of the tutorials, but haven't found them useful for a newbie
> problem I've encountered.
> I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to
> get a Stata ".dta" file into Python. In Stata the data set is an NXK
> matrix where N is the number of observations (households) and K is the
> number of variables.
> I gather it's now a list where each element of the list is an
> observation (a vector) for one household.  The name of my list is
> "data"; I gather Python recognizes the first observation by: data[1] .
> Example,
> data = [X_1, X_2, X_3, . . . . , X_N]  where each X_i for all i, is
> vector of household characteristics, eg X_1 = (age_1, wage_1, . . . ,
> residence_1).
>
> I also have a list for variable names called "varname"; although I'm not
> sure the module I used to extract the ".dta" into Python also created a
> correspondence between the varname list and the data list--the python
> interpreter won't print anything when I type one of the variable names,
> I was hoping it would print out a vector of ages or the like.

Assuming you're working in the python console somewhat from the example
on the source website for PyDTA:

from PyDTA import Reader
fields = ','.join(['%s']*len(dta.variables()))

... you might try starting at dir|help (dta.variables)

I didn't look, but the sources are available as well.

>
> In anycase, I'd like to make a scatter plot in pylab,

I think I'd use dictionaries along these lines:

wages = { age_1: [ X_1, X_15, X_3...],
age_2: [ X_2, X_5... ],
]

> but don't know how
> to  identify a variable in "data" (i.e.  I'd like a vector listing the
> ages and another vector listing the wages of  households).

I think poking into dta.variables will answer this one.

HTH,

Emile

> Perhaps, I
> need to run subroutine to collect each relevant data point to create a
> new list which I define as my variable of interest?  From the above
> example, I'd like to create a list such as: age = [age_1, age_2, . . . ,
> age_N] and likewise for wages.
>
> Any help you could offer would be very much appreciated.  Also, this is
> my first time using the python tutor, so let me know if I've used it
> appropriately or if I should change/narrow the structure of my question.
>
> Thanks
> Steve
>
> --
> Steven Buck
> Ph.D. Student
> Department of Agricultural and Resource Economics
> University of California, Berkeley
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

```