[SoC2010-General] Proposal for porting RPy2 to Python 3

Arc Riley arcriley at gmail.com
Thu Apr 8 00:46:21 CEST 2010


You will need to submit your application to
http://socghop.appspot.com/before the deadline (Friday at 1900 UTC)

Please include at least two ways to contact you with your application
(including email, phone, irc, xmpp, etc) - we don't have access to anything
but your name and the body of your proposal.



On Wed, Apr 7, 2010 at 6:21 PM, Grzegorz Slodkowicz <jergosh at gmail.com>wrote:

> Dear all,
>
> My name is Greg Slodkowicz and I would like to apply for the RPy2 porting
> project. I am currently a student pursuing an MSc in Systems Biology but my
> background is mainly in Computer Science. I have been in touch with RPy2's
> author Laurent Gautier and this application reflects our discussions about
> scope of the project.
>
> == Abstract
> RPy2 is an interface between Python and the statistical package R. It
> allows accessing R's rich collection of libraries from Python. RPy2 is
> mainly used in bioinformatics, geostastistics and finance and has got a
> substantial user base for a scientific project (~1k downloads per month). It
> is, however, currently only compatible with Python 2.x. This project aims to
> port existing functionality of RPy2 to Python 3 as well as to improve
> integration and performance by taking advantage of new features in Python 3.
> It will be completed in three stages: porting the RPy2 package to match
> functionality of the current release, improving functionality and
> performance by taking advantage of the new features of Python and its C API
> (MemoryViews, PyCapsules, ordered dictionaries) and lastly implementing an R
> graphical device which would be able to interface with Matplotlib.
>
> == Background
> RPy2 is a package providing Python interface to the popular statistical
> package R. Thanks to it, any of statistical modules in R can be accessed
> from Python. Since RPy2 is used in a variety of areas such as
> bioinformatics, geostastistics and finance, this GSoC project would help
> boost adoption of Python 3.
>
> == Project schedule
> Preparation for the project (during 'Community Bonding' period):
> * reading documentation (I have some experience with writing C extensions
> for Python, but I am not so familiar with R internals).
> * discussing design and implementation details of the graphical device
> (final part of the project).
>
> I would then implement the project in the following stages:
> * 24.05-13.06 Matching functionality of current version of RPy2.
> This part would be easy if it was enough just to replace calls to the old
> API with Py3 ones. It is not clear, however, when R expects ASCII strings
> ('bytes' type in Python 3) and when Unicode (default strings in Python), and
> it will segfault when it gets the wrong kind. Fixing this will likely
> require a lot of debugging and detective work on R source code. Other issues
> with interfacing R and Python may also crop up during implementation. (3
> weeks total)
> * 14.06-11.07  Improving/optimising the integration using new features from
> Py3 API.
> There are a few features in the new Python C API that would fit in well
> with R's internal data structures:
> - MemoryViews could be used to efficiently expose arrays in R
> - PyCapsules would be a great wrapper for R's SEXP data type
> - Ordered dictionaries are very similar to the Pairlist sexp type (LISTSXP)
> which R uses for passing function arguments  (4 weeks total)
> * 12.07-16.07  Testing, last minute bug fixing and finalising documentation
> * 16.07  Mid-term evaluation
> * 17.07-8.08 Graphical device connecting R and Matplotlib.
> Implementing an R device which could interface with Matplotlib would
> tighten the integration between Python and R (RPy2 is already compatible
> with NumPy). (4 weeks)
> * 9.08-16.08 Testing, last minute bug fixing and finalising documentation.
> * 16.08 Firm pencil down date.
>
> == About me
> I have completed the first two years of my Bachelor's degree in Computer
> Science at the Technical University of Lodz, Poland after which I moved to
> Denmark to study at the Technical University of Denmark (DTU) in 2008.  I
> have since changed my focus to bioinformatics and biology. I have been
> working as a student helper since November 2008 and later (since August
> 2009) as a scientific programmer at the Centre for Biological Sequence
> Analysis (part of department of Systems Biology at DTU). My primary area of
> focus is the development of software for data management and analysis.
>
> My first experience with programming was when I taught myself C from K&R
> when I was fourteen. By the time I started my studies, I have completed a
> few toy projects, including two years of game (MUD) development in LPC, a
> dialect of C. At the same time I discovered Python and it immediately became
> my language of choice. I used Python to develop an entry in a programming
> competition organised by a Polish social networking portal (I wrote a Python
> wrapper for their API and a small desktop notification app).
>
> After beginning my studies I gained some experience in commercial software
> development:
> I completed a small project in Python for a company managing online orders
> for restaurants. The application converted Google Checkout orders to text
> messages which were then dispatched to appropriate restaurants.
> In the summer of 2008, I participated in a project at the Dublin Institute
> of Technology. Along with two other students, I implemented (in C++ using
> Qt4 libraries) an application for managing simulations and parsing,
> analysing and displaying results in real-time (the research area there was
> quality in VOIP transmissions).
>
> During my studies, focus was placed mainly on programming in C and C++
> (most of my courses were at an  Electrical Engineering faculty). I excelled
> in courses which involved study and implementation of algorithms and data
> structures. My current work involves C++ development for scientific
> applications where performance is critical, Python for general scripting and
> R for statistical data analysis.
>
> I also have 5 years of experience with Linux (mainly Debian and later
> Ubuntu) which I used as my main platform before switching to Mac 1.5 years
> ago.
>
> Thank you for reading my application. I would be happy to provide any
> additional details.
>
>
> Best regards,
> Greg Slodkowicz
> _______________________________________________
> SoC2010-General mailing list
> SoC2010-General at python.org
> http://mail.python.org/mailman/listinfo/soc2010-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/soc2010-general/attachments/20100407/1decbc0a/attachment-0001.html>


More information about the SoC2010-General mailing list