[AstroPy] AstroPy Digest, Vol 81, Issue 12

Aldcroft, Thomas aldcroft at head.cfa.harvard.edu
Thu Jun 20 13:38:18 EDT 2013


On Thu, Jun 20, 2013 at 12:50 PM, Chris Beaumont <beaumont at hawaii.edu>wrote:

> I thought I'd chime in on the pandas discussion :)
>
> I'm starting to use pandas a bit more in my day-to-day work. The two
> features most useful to me are:
>
> 1) Its file parsers are pretty robust and fast. I always try parsing CSV
> with pandas first
>

I've wondered how hard it would be to incorporate some of the pandas CSV
fast reading functions for the easy cases.  I'm assuming it is licensed so
that would be an option.


>
> 2) For tables tables with lots of categorical data, the grouping
> functionality is very nice. For example, calculations like "the mean age of
> each spectral type of star in my catalog" are usually one liners like
> df.groupby(['spectral_type']).age.mean. I spend a lot of time on the
> "split-apply-combine" page on the pandas docs (
> http://pandas.pydata.org/pandas-docs/stable/groupby.html).
>

Group-by and related functionality is top on my list of priorities for
astropy.table (in fact I see it every day on my google keep app...).  Join
and merging are in master now.  In my tests the astropy table join is
within a factor of 2 to 3 in speed relative to pandas, so in most use cases
it should be good enough.

It's probably worth pointing out to the community that it was not a
lightly-taken decision to reject pandas for use as the base data storage
container.  For the case of tables there is one show-stopper which is that
pandas DataFrame does not support arbitrary multi-dimensional columns, i.e.
column where each element is itself an N-d array.  These occur enough in
astronomy and are supported by FITS and VO standards, so the astropy Table
must be able to represent that.  The lack of support for table and column
metadata is a smaller but still important issue.

Having said that, there is no question pandas has a ton of highly-efficient
and useful machinery and we are working on ways to improve
inter-operability.  This includes being able convert between Table and
DataFrame easily.  Suggestions and (especially) pull requests welcome.


>
> I won't speculate about whether that's enough an asset to warrant a
> dependency in astropy. I do agree that lots of other pandas features don't
> translate as well into astronomy use.
>
>
>
> On Thu, Jun 20, 2013 at 12:34 PM, Erik Tollerud <erik.tollerud at gmail.com>wrote:
>
>> I'm of mixed minds about traits UI because once you know it you can make
>> great GUIs with it, but I've spent a lot of time troubleshooting people's
>> python installations to get traits to work.  That is, in general it can be
>> tricky to get installed because of all the dependencies.  Maybe this has
>> improved recently with Enthought's Canopy (or other new python distros),
>> but that's been my past experience.
>>
>> More generally, the view in the astropy core package is that we don't
>> want to put GUIs in the core because GUIs always carry lots of
>> dependencies, which we don't want to be forced to deal with.  But part of
>> the whole reason for affiliated packages was to get around this, so we're
>> happy to see GUI-based affiliated packages.
>>
>>
>> As for Pandas, to be totally honest, I don't see a huge amount to be
>> gained from adding a Pandas dependency Astropy.  It's honestly not clear
>> what it gives the astronomy community that numpy does not already have.
>>  The following quote from the Pandas web site has guided me to that
>> conclusion: "*pandas* helps fill this gap, enabling you to carry out
>> your entire data analysis workflow in Python without having to switch to a
>> more domain specific language like R."
>>
>> I have been carrying out my entire data analysis workflow for some time
>> now in python without using Pandas.  It looks to me like Pandas is a tool
>> that was written by and for statisticians who use R.  While we can take
>> lessons from this, it's not clear we get much out of it in an astronomy
>> context. For example, how would it make astropy's NDData, Quantity, or
>> Table better to use a Pandas DataFrame vs. a numpy array? Most of what we
>> are doing is building astronomy-convenient interfaces, and I'm not sure
>> what Pandas adds there, at the cost of a pretty heavy-weight dependency.
>>
>> It could just be that I don't know enough about Pandas, though.  So if
>> someone who knows Pandas better can speak to this, I'm all ears.
>>
>>
>>
>>
>> On Tue, Jun 18, 2013 at 3:35 PM, Thøger Rivera-Thorsen <trive at astro.su.se
>> > wrote:
>>
>>>  Pandas is a part of the newly-defined SciPy stack, after all, so that
>>> would be part of any science-oriented distribution worth its salt. In fact,
>>> I think it could be a good idea for astropy in general to use under the
>>> hood, but again, could clash with the philosophy of the project and
>>> possibly also maintainabillity.
>>>
>>> As for offering my code or just my experience, I'll have to square it
>>> with my supervisor first, and I also think it depends on what direction the
>>> project in question will take. I'm positive about the idea (which is why I
>>> wrote in the first place), but supervisor might think it is a better idea
>>> to actually get my paper in the project wrapped up before sending the code
>>> out there. Will get back about that one!
>>>
>>> /Emil
>>>
>>>
>>>
>>>
>>>
>>> On 2013-06-18 20:53, Slavin, Jonathan wrote:
>>>
>>> Hi Emil,
>>>
>>>  That looks very nice!  I don't see Pandas as a big issue in terms of
>>> dependencies.  I don't know that much about traits, etc.  My thought about
>>> the gui was just based on my experience with matplotlib, and the fact that
>>> it is widely used -- though I would agree that too many dependencies can be
>>> a deterrent to people using something.  Are you offering your code as a
>>> starting point for the project?  It strikes me that many have gotten some
>>> sort of fitting package to a point of personal usability but no one has the
>>> time/interest/motivation to make a more generally usable package.
>>>
>>>  Jon
>>>
>>>  On Tue, Jun 18, 2013 at 2:34 PM, <astropy-request at scipy.org> wrote:
>>>
>>>> Date: Tue, 18 Jun 2013 20:39:55 +0200
>>>> From: Th?ger Rivera-Thorsen <thoger.emil at gmail.com>
>>>> Subject: Re: [AstroPy] ESA Summer of Code in Space 2013
>>>> To: astropy at scipy.org
>>>> Message-ID: <51C0A97B.8090703 at gmail.com>
>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>>
>>>> I have been working on a fitting GUI for a while, although it is made
>>>> with a specific task in mind.
>>>> However, it is not based on Matplotlib but on Traits/Traitsui/Chaco and
>>>> Pandas. It is made for a specific projhect I'm working and as such not
>>>> yet usable for more general cases, but it could be a starting point, if
>>>> the dependencies don't conflict with astropy politics.
>>>>
>>>> Especially, I am happy about the choice of Pandas for managing a quite
>>>> complex data structure (the fitted and/or guessed values of an arbitrary
>>>> number of transitions for an arbitrary number of rows or collapsed rows
>>>> of a spatially resolved spectrum) of a), but also with the Traits-based
>>>> interactive interface to build complex line profiles from single
>>>> gaussians, good for fitting-by-eye and giving good initial guesses for
>>>> fitting of complex line profiles. It hooks directly up to a wrapper I've
>>>> made for lmfit, but given the modularity, it should be relatively easy
>>>> to change to other backends.
>>>>
>>>> It's still a work-in-progress, but there are some screenshots here:
>>>> http://flic.kr/s/aHsjGaEMGg .
>>>> I know the choice and number of dependencies may be prohibitive but it
>>>> saved a lot of work on the GUI, and Pandas means the difference between
>>>> sanity and madness when it comes to keeping track of so many parameters.
>>>>
>>>> Cheers,
>>>> Emil
>>>>
>>>
>>>
>>>
>>>  ________________________________________________________
>>> Jonathan D. Slavin                 Harvard-Smithsonian CfA
>>> jslavin at cfa.harvard.edu       60 Garden Street, MS 83
>>> phone: (617) 496-7981       Cambridge, MA 02138-1516
>>> fax: (617) 496-7577            USA
>>> ________________________________________________________
>>>
>>>
>>>
>>> _______________________________________________
>>> AstroPy mailing listAstroPy at scipy.orghttp://mail.scipy.org/mailman/listinfo/astropy
>>>
>>>
>>>
>>> _______________________________________________
>>> AstroPy mailing list
>>> AstroPy at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>
>>>
>>
>>
>> --
>> Erik
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
>>
>
>
> --
> ************************************
> Chris Beaumont
> Graduate Student
> Institute for Astronomy
> University of Hawaii at Manoa
> 2680 Woodlawn Drive
> Honolulu, HI 96822
> www.ifa.hawaii.edu/~beaumont
> ************************************
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20130620/3807ea2f/attachment.html>


More information about the AstroPy mailing list