[Tutor] Data frame packages
bjameshunter at gmail.com
Thu Mar 31 20:26:39 CEST 2011
I appreciate all the responses and apologize for not being more detailed. An
R data frame is a tightly grouped array of vectors of the same length. Each
vector is all the same datatype, I believe, but you can read all types of
data into the same variable. The benefit is being able to quickly subset,
stack and such (or 'melt' and 'cast' in R vernacular) according to any of
your qualitative variables (or 'factors'). As someone pretty familiar with R
and quite a newbie to python, I'm wary of insulting anybody's intelligence
by describing what to me is effectively the default data format my most
familiar language. The following is some brief R code if you're curious
about how it works.
d <- read.csv(filename, header = TRUE, sep = ',') #this reads the table.
'<-' is the assignment operator
d[ , 'column.name'] # this references a column name. This same syntax can be
used to reference all rows (index is put left of the comma) and columns in
The data frame then allows you to quickly declare new fields as functions of
newVar <- d[ ,'column.name'] + d[ ,'another.column']
d$newVar <- newVar # attaches newVar to the rightmost column of 'd'
At any rate, I finally got pydataframe to work, but had to go from Python
2.6 to 2.5. pydataframe has a bug for Windows that the author points out.
Line 127 in 'parsers.py' should be changed from:
columns = list(itertools.izip_longest(*split_lines ,fillvalue = na_text))
columns = list(itertools.izip_longest(list(*split_lines),fillvalue =
I don't know exactly what I did, but the module would not load until I did
that. I know itertools.izip_longest requires 2 arguments before fillvalue,
so I guess that did it.
It's a handy way to handle alpha-numeric data. My problem with the csv
module was that it interpreted all numbers as strings.
On Thu, Mar 31, 2011 at 8:17 AM, James Reynolds <eire1130 at gmail.com> wrote:
> On Thu, Mar 31, 2011 at 11:10 AM, Blockheads Oi Oi <
> breamoreboy at yahoo.co.uk> wrote:
>> On 31/03/2011 09:38, Ben Hunter wrote:
>>> Is anybody out there familiar with data frame modules for python that
>>> will allow me to read a CSV in a similar way that R does? pydataframe
>>> and DataFrame have both befuddled me. One requires a special stripe of R
>>> that I don't think is available on windows and the other is either very
>>> buggy or I've put it in the wrong directory / installed incorrectly.
>>> Sorry for the vague question - just taking the pulse. I haven't seen any
>>> chatter about this on this mailing list.
>> What are you trying to achieve? Can you simply read the data with the
>> standard library csv module and manipulate it to your needs? What makes
>> you say that the code is buggy, have you examples of what you tried and
>> where it was wrong? Did you install with easy_install or run setup.py?
>>> Tutor maillist - Tutor at python.org
>>> To unsubscribe or change subscription options:
>> Mark L.
>> Tutor maillist - Tutor at python.org
>> To unsubscribe or change subscription options:
> I'm not familiar with it, but what about http://rpy.sourceforge.net/
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor