[AstroPy] ESA Summer of Code in Space 2013

Tue Jun 18 19:26:57 EDT 2013

On 18-06-2013 22:53, Joe Harrington wrote:
> Just to interject a few quick thoughts into this discussion...
>
> Knowing little of the situation on the ground WRT spectral line fitting,
> it does seem to me that it's been done a million times.  If any of those
> implementations is decent, OSS, and compiled, why not just wrap it?
It's been done a million times, but I don't know of any one solution 
that people are generally happy with.
Sherpa is maybe the best shot, but it comes either as part of the 
enormous CXC software package, or in a good but poorly supported 
stand-alone version with long-standing installation bugs etc. The GUI is 
limited to the interactive capabilities of Matplotlib, the procedural 
CLI interface is very easy to use from a console but not well fit for 
scripting/programming, and the OO interface is not too well documented 
and seemed generally obscure to me.
I've personally used it, but given the problems in connection to 
installing new versions or on new computers, I have finally gone over to 
LMfit. I have to do more myself, but I know that it works.

IRAF/PyRAF has finally come in a manageable package, but by IRAF 
standards, "manageable" means that it comes embedded in its own Python 
distribution; and Iraf/pyraf is not exactly pythonic in its usage and 
interface

Another problem is that when you "just" wrap compiled routines in e.g. C 
or Fortran, compiler and library dependency issues usually show up in 
great numbers. The hard number crunching can sometimes be wrappped - and 
has been, in e.g. mpmath - but anything more than that often relies on 
archaic libraries hard-coded to need old versions not in the standard 
package manager of the major OSs, etc.

I think the right way ahead is to build it on well-tried and 
well-supported, actively developed packages and libraries, that should 
protect against the old dependency walls and endless install failures.

> Probably the GUI would want to be in Python, but there's a lot of other
> stuff under the hood that would be time-consuming and likely slower to
> re-(re-re-re-)write.  At the very least, you might save some time by
> studying what existing codes do and what people like about them.

I agree as far as if something reliable already exists, by all means, 
use it. But "sometimes frugality cheats wisdom",
as the Swedish saying goes: puttig in extra effort now can save a lot of 
work later.
As far as I can see, what makes sense to reuse is number crunching code. 
On top of this comes
another layer of infrastructure, and yet another layer of (G)UI.

The infrastructure is the model classes that keep track of free, tied 
and bound parameters, initial and best-fit values etc.,
and the way they keep track of each other.
This is where I think an extra effort could really pay off in the long 
run, this is the hard part and if Done Right™, it can really
make a difference. This means that the classes for this should be either 
already existing astropy classes or inherit heavily from them,
and in general, the astropy framework should be used with as little 
modification as possible.

> Jon Slavin> - robust fitting routines that return error bars on fitted parameters
>
> Regarding the reporting of fits with errors, as many of us have
> painfully learned, minimizers don't give these, they only pretend to.
> Since that's often not good enough, it would be nice to see a relatively
> plug-n-play MCMC (e.g., DE-) put forward as a fitting package.  It would
> have to evaluate its own distribution, gave errors, and otherwise
> behaved a bit like lmfit.  Yes, there are subtleties to Markov Chains,
> but this is also true of minimizers.  Getting something out there and
> inviting people to improve it could produce something usable in a few
> iterations.  My group can contribute some DE-MCMC code that someone
> could adapt.

Sounds absolutely great. Sherpa has similar routines and they are really
strong work horses, so possibly, some of their code could be studied too.

> To Jon's list of requirements, I'd add:
>
> - Able to use a GUI for user-cue input OR take such input from a text
>    file
> - Able to write such a text file from the GUI user-cue input, for
>    subsequent runs (and add to it from a second run, etc.)

I personally think that the ideal GUI at any (or most) moment can take 
both command line inputs and point-and-click inputs,
as I have tried to do in the GUI I wrote about earlier. And when it can 
be given commands during runtime, of course it can also be scripted.

> One thing that isn't clear to me from the discussion is whether the
> scope is merely to identify the center and width vs. pixel number to get
> a wavelength solution or calculate Doppler shifts, or whether to do the
> whole job of reading multiple line lists, broadening the lines, and
> fitting all the lines to the data, returning column densities and
> temperatures vs. depth.  In other words, will it calibrate the spectrum
> or reproduce it with a model?

I don't think that there is a general consensus on this, but in my view, 
the question
must be which tasks make sense to do from the command line and which 
ones will be made significantly easier
by means of a GUI, and implementing them in that order. I always shun 
from software that makes me set everything through
point-and-click, file choosing dialogs etc., when it could be done 
easier by CLI.
Reading a line list from a file should be a no-brainer from the command 
line, and possibly, a good
fitting package would provide a few convenience functions for loading 
data and line lists into the appropriate data structures.

I think that the package should start with the simpler models first, and 
then move on to the more complicated stuff later.
Which is another reason why I think that a well designed infrastructure 
layer is very important: once it has been made properly
modular, it is easy to write more different modules and plug them in. In 
that way, if someone comes up with a smart way to implement a very 
sophisticated model, we can plug it in to the existing infrastructure 
and give it as an option. One example where this is relevant is in ISM 
absorption lines: does one just want to fit them to quick but not too 
accurate gaussians or lorentzians, or does one want to fit them to a 
"geometrically" (width, depth, etc) defined Voigt profile, or to a 
full-fledged voigt profile in terms of N and b and z? Proper modularity 
could mean that the latter could be relatively simply plugged in to 
existing infrastructure once this is built (I have made a half,and not 
yet succesful, attempt at this myself).

Something I have implemented in my half-cooked GUI so far is:

- Load FITS or ascii file into a 2d-spectrum class (as CLI function)
- extract any continuous group of rows into properly weighted 1D 
spectrum (interactive, can easily get a CLI convenience function).
- interactively model line profiles of up to 10 components each, and 
immediately fit them to data using lmfit as backend (have a half 
finished sherpa backend too).
- Saves the model for all transitions for all rows of potentially 
several spectra in one big, flat, human-readable ascii file which during 
runtime is handed as a pandas DataFrame, which of course also means that 
all pandas and numpy operations are available for it, and the graphical 
representation gets updated accordingly.
- Assign one-letter labels (represeted as colors in the plot) to each 
peak. Can be done both CLI and GUI way.
- Set/edit wavelength range(s) for inclusion in fit and which ones to 
ignore, CLI and GUI.

Things on my current wish list:

- Proper continuum modeling (only takes an additive constant right now).
- instrument convolution of model
- more different line profiles (lorentz, voigt, etc.). The tricky part 
is the GUI.
- more convenience functions and CLI options.
- seamlessly interchangeable velocity and wavelength representation on 
the diffraction axis
- more sophisticated parameter handling (freeze/thaw, tie etc.)

On the other hand, I think that things like e.g. stacking, flux 
calibration, and other clearly calibration phase tasks are better left to
other kinds of software.

Cheers,
Emil

> --jh--
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy