[Tutor] reading variables in a data set?

Kent Johnson kent37 at tds.net
Sun Jul 5 17:02:12 CEST 2009


On Sat, Jul 4, 2009 at 12:09 PM, Steven Buck<buckstec at gmail.com> wrote:

> I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to get
> a Stata ".dta" file into Python. In Stata the data set is an NXK matrix
> where N is the number of observations (households) and K is the number of
> variables.
> I gather it's now a list where each element of the list is an observation (a
> vector) for one household.  The name of my list is "data"; I gather Python
> recognizes the first observation by: data[1] .
> Example,
> data = [X_1, X_2, X_3, . . . . , X_N]  where each X_i for all i, is vector
> of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1).
>
> I also have a list for variable names called "varname"; although I'm not
> sure the module I used to extract the ".dta" into Python also created a
> correspondence between the varname list and the data list--the python
> interpreter won't print anything when I type one of the variable names, I
> was hoping it would print out a vector of ages or the like.

varname is probably just a list of strings without any direct
connection to the data.

> In anycase, I'd like to make a scatter plot in pylab, but don't know how to
> identify a variable in "data" (i.e.  I'd like a vector listing the ages and
> another vector listing the wages of  households).  Perhaps, I need to run
> subroutine to collect each relevant data point to create a new list which I
> define as my variable of interest?  From the above example, I'd like to
> create a list such as: age = [age_1, age_2, . . . , age_N] and likewise for
> wages.

You can use a list comprehension to collect columns from the data. If
age is the first element of each observation (index 0), and wages the
second (index 1), then
ages = [ observation[0] for observation in data ]
wages = [ observation[1] for observation in data ]

> Any help you could offer would be very much appreciated.  Also, this is my
> first time using the python tutor, so let me know if I've used it
> appropriately or if I should change/narrow the structure of my question.

It's very helpful if you show us the code you have so far.

Kent


More information about the Tutor mailing list