[AstroPy] Pandas vs Astropy tables.

Wolfgang Kerzendorf wkerzendorf at gmail.com
Thu Jun 20 14:09:17 EDT 2013


I think we're all forgetting one of the messages of astropy: foster interoperability between Python astronomy (and science) packages.

I use pandas a lot in my radiative transfer code as I need (among other things)  to do group by functionality as fast as possible (and sqlite is too slow). However when I read in the tables they first go through astropy tables because it can handle units and converts them all to cgs. I think we should not try to compete with PANDAS but foster interoperability (similarly to us not trying to compete with numpy or scipy) . PANDAS will never care about writing tables to a deluxe table and that's where astropy shines (for example). 

So if you need to read in CSV files (and speed is important, which it is often not) I have no problem recommending pandas, however if you need to deal with DAOPhot, CDS, etc. files there's no alternative to astropy.tables. 

For now I don't see a reason for including PANDAS in astropy core, as we don't have a need for its functionality there. But including this in affiliated packages (if needed) I think is a good idea.

My 2 cents,
   Wolfgang
On 2013-06-20, at 1:47 PM, Éric Depagne <eric at depagne.org> wrote:

> Just a quick question:
> 
> Since the discussion is about what pandas, wouldn't it be possible to add in 
> cc some pandas devs, to let them know?
> 
> Éric.
>> On Thu, Jun 20, 2013 at 12:50 PM, Chris Beaumont <beaumont at hawaii.edu>wrote:
>>> I thought I'd chime in on the pandas discussion :)
>>> 
>>> I'm starting to use pandas a bit more in my day-to-day work. The two
>>> features most useful to me are:
>>> 
>>> 1) Its file parsers are pretty robust and fast. I always try parsing CSV
>>> with pandas first
>> 
>> I've wondered how hard it would be to incorporate some of the pandas CSV
>> fast reading functions for the easy cases.  I'm assuming it is licensed so
>> that would be an option.
>> 
>>> 2) For tables tables with lots of categorical data, the grouping
>>> functionality is very nice. For example, calculations like "the mean age
>>> of each spectral type of star in my catalog" are usually one liners like
>>> df.groupby(['spectral_type']).age.mean. I spend a lot of time on the
>>> "split-apply-combine" page on the pandas docs (
>>> http://pandas.pydata.org/pandas-docs/stable/groupby.html).
>> 
>> Group-by and related functionality is top on my list of priorities for
>> astropy.table (in fact I see it every day on my google keep app...).  Join
>> and merging are in master now.  In my tests the astropy table join is
>> within a factor of 2 to 3 in speed relative to pandas, so in most use cases
>> it should be good enough.
>> 
>> It's probably worth pointing out to the community that it was not a
>> lightly-taken decision to reject pandas for use as the base data storage
>> container.  For the case of tables there is one show-stopper which is that
>> pandas DataFrame does not support arbitrary multi-dimensional columns, i.e.
>> column where each element is itself an N-d array.  These occur enough in
>> astronomy and are supported by FITS and VO standards, so the astropy Table
>> must be able to represent that.  The lack of support for table and column
>> metadata is a smaller but still important issue.
>> 
>> Having said that, there is no question pandas has a ton of highly-efficient
>> and useful machinery and we are working on ways to improve
>> inter-operability.  This includes being able convert between Table and
>> DataFrame easily.  Suggestions and (especially) pull requests welcome.
>> 
>>> I won't speculate about whether that's enough an asset to warrant a
>>> dependency in astropy. I do agree that lots of other pandas features
>>> don't translate as well into astronomy use.
>>> 
>>> On Thu, Jun 20, 2013 at 12:34 PM, Erik Tollerud 
> <erik.tollerud at gmail.com>wrote:
>>>> I'm of mixed minds about traits UI because once you know it you can make
>>>> great GUIs with it, but I've spent a lot of time troubleshooting
>>>> people's python installations to get traits to work.  That is, in
>>>> general it can be tricky to get installed because of all the
>>>> dependencies.  Maybe this has improved recently with Enthought's Canopy
>>>> (or other new python distros), but that's been my past experience.
>>>> 
>>>> More generally, the view in the astropy core package is that we don't
>>>> want to put GUIs in the core because GUIs always carry lots of
>>>> dependencies, which we don't want to be forced to deal with.  But part
>>>> of the whole reason for affiliated packages was to get around this, so
>>>> we're happy to see GUI-based affiliated packages.
>>>> 
>>>> 
>>>> As for Pandas, to be totally honest, I don't see a huge amount to be
>>>> gained from adding a Pandas dependency Astropy.  It's honestly not clear
>>>> what it gives the astronomy community that numpy does not already have.
>>>> 
>>>> The following quote from the Pandas web site has guided me to that
>>>> 
>>>> conclusion: "*pandas* helps fill this gap, enabling you to carry out
>>>> your entire data analysis workflow in Python without having to switch to
>>>> a more domain specific language like R."
>>>> 
>>>> I have been carrying out my entire data analysis workflow for some time
>>>> now in python without using Pandas.  It looks to me like Pandas is a
>>>> tool that was written by and for statisticians who use R.  While we can
>>>> take lessons from this, it's not clear we get much out of it in an
>>>> astronomy context. For example, how would it make astropy's NDData,
>>>> Quantity, or Table better to use a Pandas DataFrame vs. a numpy array?
>>>> Most of what we are doing is building astronomy-convenient interfaces,
>>>> and I'm not sure what Pandas adds there, at the cost of a pretty
>>>> heavy-weight dependency.
>>>> 
>>>> It could just be that I don't know enough about Pandas, though.  So if
>>>> someone who knows Pandas better can speak to this, I'm all ears.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jun 18, 2013 at 3:35 PM, Thøger Rivera-Thorsen
>>>> <trive at astro.su.se
>>>> 
>>>>> wrote:
>>>>> Pandas is a part of the newly-defined SciPy stack, after all, so that
>>>>> 
>>>>> would be part of any science-oriented distribution worth its salt. In
>>>>> fact, I think it could be a good idea for astropy in general to use
>>>>> under the hood, but again, could clash with the philosophy of the
>>>>> project and possibly also maintainabillity.
>>>>> 
>>>>> As for offering my code or just my experience, I'll have to square it
>>>>> with my supervisor first, and I also think it depends on what direction
>>>>> the project in question will take. I'm positive about the idea (which
>>>>> is why I wrote in the first place), but supervisor might think it is a
>>>>> better idea to actually get my paper in the project wrapped up before
>>>>> sending the code out there. Will get back about that one!
>>>>> 
>>>>> /Emil
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 2013-06-18 20:53, Slavin, Jonathan wrote:
>>>>> 
>>>>> Hi Emil,
>>>>> 
>>>>> That looks very nice!  I don't see Pandas as a big issue in terms of
>>>>> 
>>>>> dependencies.  I don't know that much about traits, etc.  My thought
>>>>> about the gui was just based on my experience with matplotlib, and the
>>>>> fact that it is widely used -- though I would agree that too many
>>>>> dependencies can be a deterrent to people using something.  Are you
>>>>> offering your code as a starting point for the project?  It strikes me
>>>>> that many have gotten some sort of fitting package to a point of
>>>>> personal usability but no one has the time/interest/motivation to make
>>>>> a more generally usable package.
>>>>> 
>>>>> Jon
>>>>> 
>>>>> On Tue, Jun 18, 2013 at 2:34 PM, <astropy-request at scipy.org> wrote:
>>>>>> Date: Tue, 18 Jun 2013 20:39:55 +0200
>>>>>> From: Th?ger Rivera-Thorsen <thoger.emil at gmail.com>
>>>>>> Subject: Re: [AstroPy] ESA Summer of Code in Space 2013
>>>>>> To: astropy at scipy.org
>>>>>> Message-ID: <51C0A97B.8090703 at gmail.com>
>>>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>>> 
>>>>>> I have been working on a fitting GUI for a while, although it is made
>>>>>> with a specific task in mind.
>>>>>> However, it is not based on Matplotlib but on Traits/Traitsui/Chaco
>>>>>> and Pandas. It is made for a specific projhect I'm working and as
>>>>>> such not yet usable for more general cases, but it could be a
>>>>>> starting point, if the dependencies don't conflict with astropy
>>>>>> politics.
>>>>>> 
>>>>>> Especially, I am happy about the choice of Pandas for managing a quite
>>>>>> complex data structure (the fitted and/or guessed values of an
>>>>>> arbitrary number of transitions for an arbitrary number of rows or
>>>>>> collapsed rows of a spatially resolved spectrum) of a), but also with
>>>>>> the Traits-based interactive interface to build complex line profiles
>>>>>> from single gaussians, good for fitting-by-eye and giving good
>>>>>> initial guesses for fitting of complex line profiles. It hooks
>>>>>> directly up to a wrapper I've made for lmfit, but given the
>>>>>> modularity, it should be relatively easy to change to other backends.
>>>>>> 
>>>>>> It's still a work-in-progress, but there are some screenshots here:
>>>>>> http://flic.kr/s/aHsjGaEMGg .
>>>>>> I know the choice and number of dependencies may be prohibitive but it
>>>>>> saved a lot of work on the GUI, and Pandas means the difference
>>>>>> between sanity and madness when it comes to keeping track of so many
>>>>>> parameters.
>>>>>> 
>>>>>> Cheers,
>>>>>> Emil
>>>>>> 
>>>>> ________________________________________________________
>>>>> 
>>>>> Jonathan D. Slavin                 Harvard-Smithsonian CfA
>>>>> jslavin at cfa.harvard.edu       60 Garden Street, MS 83
>>>>> phone: (617) 496-7981       Cambridge, MA 02138-1516
>>>>> fax: (617) 496-7577            USA
>>>>> ________________________________________________________
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> AstroPy mailing
>>>>> listAstroPy at scipy.orghttp://mail.scipy.org/mailman/listinfo/astropy
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> AstroPy mailing list
>>>>> AstroPy at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>> 
>>>> --
>>>> Erik
>>>> 
>>>> _______________________________________________
>>>> AstroPy mailing list
>>>> AstroPy at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>> 
>>> --
>>> ************************************
>>> Chris Beaumont
>>> Graduate Student
>>> Institute for Astronomy
>>> University of Hawaii at Manoa
>>> 2680 Woodlawn Drive
>>> Honolulu, HI 96822
>>> www.ifa.hawaii.edu/~beaumont
>>> ************************************
>>> 
>>> _______________________________________________
>>> AstroPy mailing list
>>> AstroPy at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/astropy
> Un clavier azerty en vaut deux
> ----------------------------------------------------------
> Éric Depagne                            eric at depagne.org
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20130620/08672be5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4145 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/astropy/attachments/20130620/08672be5/attachment.bin>


More information about the AstroPy mailing list