[Numpy-discussion] Possible example application of the array interface
Tim Churches
tchur at optushome.com.au
Wed Apr 6 14:00:52 EDT 2005
Michael Sorich wrote:
> While RPy works well for small simple problems, there
> are data conversion limitations between R and Python.
> If one could efficiently convert between the major R
> data types and python scipy.base data types without
> loss of data, it would become possible to do most of
> the data manipulation in python and freely mix in R
> functions when required. This may encourage the use of
> python for the development of statistical routines.
That's exactly what we do in our project (http://www.netepi.org) which
uses NumPy, RPy and R. The Python<->R interface provided by RPy has a
few wrinkles but overall is remarkably seemless and remarkably robust.
>>From my meager understanding of RPy:
>
> R vectors are converted to python lists. It may make
> more sense to convert them to an array (either stdlib
> or scipy.base version) - without copying data if
> possible.
RPy directly converts (by copying) NumPy arrays to R arrays and vice
versa. C code is used to do this and it is quite fast. No Python lists
are involved. You do need to have NumPy installed (oncluding its header
files) when you compile RPy for this to work - otherwise RPy *does*
convert R arrays to Python lists.
> R arrays and matrices are converted to Numeric arrays.
> Eg
>
> In [8]: r.array([1,2,3,4,5,6],dim=[2,3])
> Out[8]:
> array([[1, 3, 5],
> [2, 4, 6]])
>
> However, column and row names (or dimnames for arrays
> with >2 dimensions) are lost in R->Py conversion. I do
> not know whether these conversions require copying of
> the data.
>
> R data-frames are currently converted to python
> dictionaries and I don’t think that there is any
> simple way to convert a python object to an R data
> frame. This is the biggest limitation of rpy in my
> opinion.
>
> In [16]:
> r.data_frame(col1=[1,2,3,4],col2=['one','two','three','four'])
> Out[16]: {'col2': ['one', 'two', 'three', 'four'],
> 'col1': [1, 2, 3, 4]}
>
> If it were possible to convert between an R data-frame
> and a scipy.base record array without copying or
> losing data, RPy would become more useful.
>
> I wish I understood C, scipy.base and R well enough to
> give this a go. However, this is Way over my head!
You can extend the conversion routines of RPy (in either direction)
using a very simple interface, using just Python and R. No knowledge of
C is necessary. For example, if you want to convert an R data.frame into
a custom class which you have written in Python, it is quite easy to add
that to Rpy. There is an example for doing this with data.frames given
in the Rpy documentation.
(More comments below).
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
>
>>I was just thinking about some experimental designs,
>>and whether I
>>could, perhaps, do the statistics in Python. I
>>remembered having used
>>RPy [1] briefly at some time (there may be other
>>similar bindings out
>>there -- I don't remember)
There is also RSPython, which allows Python to be called from R as well
as R to be called from Python. However, it is far more experimental than
RPy, and much harder to build and rather less robust, but more ambitious
in its scope. RPy only allows calling of R functions (almost everything
is done via functions in R) from Python, although as noted above it has
good facilities for converting R objects back into Python objects, and
also allows R objects to be returned to Python as native, unconverted R
objects - so you can store native R objects in a Python list or
dictionary if you wish. You can't see inside those native R objects with
Python, but you can use them as arguments to R functions called via RPy.
However, the default action in RPy is to do its best to convert R
objects into Python data structures when R functions called via RPy
return. That conversion is easily customisable as noted above.
>> and started thinking
>>about whether I could,
>>perhaps, combine it with numpy in some way. My first
>>thought was to
>>reimplement the relevant statistical functions; then
>>I thought about
>>how to convert data back and forth -- but then it
>>occurred to me that
>>R also uses arrays extensively, and that it could,
>>perhaps, be
>>possible to expose those (through something like
>>RPy) through the
>>array interface/protocol!
It seems that the new NumPy array interface could indeed be used to
allow Python and R to share the same array data, rather than making
copies as happens at present (albeit very quickly).
>>This would be (IMO) a good example of the benefits
>>of the array
>>protocol; it's not a matter of "getting yet another
>>array module". RPy
>>is an external library/language with *lots* of
>>features that might be
>>useful to numpy users, many of which aren't likely
>>to be implemented
>>in Python for quite a while, I'd guess (unless,
>>perhaps, someone
>>writes a translator from R, which I'm sure is
>>doable).
R is a massive project with a huge library of statistical routines - it
is several times larger in its extent than Python (that's a weakness as
well as a strength, as R tends to be sprawling and rather intimidating
in its size). R also has a very large community of top computational
statisticians behind it. Better to work with R than to try to compete
with it. That said, there is no reason not to port R libraries or
specific R functions to NumPy where that provides performance gains, or
where the data are large and already handled in NumPy. Our approach in
NetEpi (http://www.netepi.org) is to do the data selection and reduction
(usually summarisation) in NumPy (where we store data on disc as
memory-mapped NumPy arrays) and then pass the much smaller summarised
results to R for plotting or fitting complex statistical models.
However, we do calculation of elementary statistics (means, quantiles
and other measures of location, variance etc) in NumPy wherever possible
to avoid copying large amounts of data to R via RPy.
>>I don't know enough (at least yet ;) about the
>>implementation of RPy
>>and the R library to say for sure whether this would
>>even be possible,
>>but it does seem like it could be really useful...
>>
>>[1] rpy.sf.net
I have copied this message to the RPy list - hopefully some fruitful
discussion can ensue.
Tim C
More information about the NumPy-Discussion
mailing list