[AstroPy] Questions re: Markov Chain Monte Carlo

Mon Jun 3 22:13:40 EDT 2013

Hi all,

I've been using python to drive the computation of some disk models, and I'm working on shifting the computation from a full (but coarse) grid search of parameter space to instead using Markov Chain Monte Carlo (via the 'emcee' package).  In thinking about the best way to do this, I have a couple of questions.  I should note that these aren't all pure Python questions, but instead some are more algorithmic / implementation questions; I hope it's OK to ask such things here, but feel free to tell me if it isn't.

The main issue that I'm wrestling with is that each model is computationally very expensive - something like 6 minutes per set of model parameters.  So, if at all possible, I'd like to avoid recalculating the same model (or a very similar model) over and over, and instead I'd like to reuse previous calculations if possible. 

For background, the way each model calculation works is that a subdirectory is created, whose name is based on the specific parameters, e.g. 'Mdisk_2.3_Rdisk_100_T_150' etc.  This directory is then populated with input files (the density distribution, temperature distribution, etc.) and then a fortran code is called that calculates an image, ultimately producing a FITS file.

So I'm imagining something like this, but I wonder if others have suggestions or alternatives:

- Each time the Monte Carlo calculation makes a step in parameter space and chooses a set of parameters (i.e. a position vector in parameter space), do the following: 
	- Check to see if the directory corresponding to that set of parameters exists; if it doesn't, then create it and proceed with the calculation. 
	- If it does exist already, check to see if the final output image exists; if so, then use it.  If not, then presumably another process is working on creating the image; sleep for some amount of time and keep checking back until the image appears. (The 'emcee' MCMC implementation uses multiple subprocesses that explore parameter space simultaneously, so it would be possible for two processes to arrive at the same place in parameter space at the same time.)  One problem here is that if some process fails to finish a calculation, then other processes arriving at that space will get stuck there indefinitely while they wait; so maybe the wait needs to be finite, but if so, I'm not sure what gets returned after that finite wait; maybe just an error. 

So the first question is whether there's any straightforward way to avoid a race condition here, i.e.  two processes both checking for the existence of a directory, both finding it non-existent, and then both trying to create it.  Would something like this work?

if os.path.exists(dirname):
	(code to use existing result in that dir, or wait for calculation to complete)
else:  # Directory doesn't exist, try to create it
	try:
		os.mkdir(dirname)
	except OSError:
	# Someone else got there first and created the dir since we last checked
		(code here to wait for completion and then return)
	else: 	# No exception was raised
		(code here to calculate the model)

The second question involves rounding of parameter values, and its effect on a MCMC calculation.  So far I've only been choosing relatively round numbers in my parameter search, since it has been a grid (e.g. exponents like 0.6, 0.7, 0.8, disk radii like 100, 150, 200).  But once those values start being chosen by the MCMC routine, they won't be round numbers at all.  This presents a few potential problems: first, since I'm using parameter values to name the directories, they need to be rounded to some set precision to create a reproducible directory name that I can use later to identify that model.  Given how expensive each computation is, some rounding would be beneficial, since, say, a power-law exponent of 0.5999 isn't going to produce a model that's meaningfully different from an exponent of 0.6 - so if we can re-use previously-calculated results, we'll save time (especially near high-probability spots in parameter space, which should be visited more frequently). 

  It seems like this shouldn't be too much of a problem (but I'm not that experienced with MCMC, which is why I'm asking), since:

	- The algorithm for moving around in parameter space is generally somewhat probabilistic anyway, so it shouldn't matter if the actual spot chosen is slightly different than what the algorithm specifies (due to rounding); 

	- The above will only be true as long as we keep track of how we do the rounding, and apply the same rounding to the post-facto analysis of the chain as we do to the parameters for calculating chi-squared.  (That is, the MCMC algorithm will be keeping track of the *unrounded* values and storing those with the chain; so we need to apply the same rounding function to the vector of parameter-space values both before calculating the probability, and when plotting the posterior distributions at the end.  I can just encode any decisions about how to round each parameter in some function, and use that same function in both cases to transform a given position vector.) 

OK, sorry for such a long message - thanks in advance for any thoughts.

Eric