[SciPy-user] Maximum entropy distribution for Ising model - setup?
![](https://secure.gravatar.com/avatar/91dc879a1f4f8a107be62f4e108002de.jpg?s=120&d=mm&r=g)
Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not-unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html). I would like to pose my problem (which is quite similar to the problem discussed in the thread above) and kindly ask you your opinion on that: My space is a set of discrete nodes,s_i, where i=1,...,N, which can take two values, {0,1}. Empirically I have the following information: <s_i>_emp and <s_i*s_j>_emp, where i,j=1,...,N with i!=j. It is well known in the literature that the Ising model P(s_1, s_2, ..., s_N) = 1 / Z * exp( sum(h_i*s_i) + 0.5*sum(J_ij*s_i*s_j) ) i i!=jmaximizes entropy with the constraints given above (in fact, this is not the Ising model, because the Ising model assumes only nearest-neigbour interactions, and I have interactions with all other nodes, but I believe it is still true that the above P(s1,...sN)).What I would like is to solve the inverse problem of finding the h_i and J_ij which maximize entropy given my constraints. However, I would like to restrict the number of h_i and J_ij possible, since having complete freedom could become an unwieldly problem. For example, I could restrict h_i = H and J_ij = J for all i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j.If I understand correctly the discussion in the thread shown above, a numerical solution for the inverse problem would be:hi_{new}=hi_{old} + K * (<si> - <si>_{emp}) Jij_{new}=Jij_{old}+ K' * (<si*sj> - <si*sj>_{emp}) where K and K' are pos. "step size" constants. (On the RHS, <si> and <si*sj> are w.r.t. hi_{old} and Jij_{old}.)Have IHave I understood all this correctly? In particular, for the case h_i = H and J_ij = J for all i,j=1,...N, i!=j could I simplify the previous algorithm by restricting the calculations only to say i=1 (i=2,...,N should be the same?), and for the case h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j simplify it by restricting the calculations only to say i=1 and i=M+1?Thank you for your help and sorry if I am new here and I have committed some "ettiquette" mistake.Jordi
![](https://secure.gravatar.com/avatar/91dc879a1f4f8a107be62f4e108002de.jpg?s=120&d=mm&r=g)
Sorry, I see my previous message has been a disaster in formatting. I try now in a different way. Sorry for the inconveniences. Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not-unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html). I would like to pose my problem (which is quite similar to the problem discussed in the thread above) and kindly ask you your opinion on that: My space is a set of discrete nodes,s_i, where i=1,...,N, which can take two values, {0,1}. Empirically I have the following information: <s_i>_emp and <s_i*s_j>_emp, where i,j=1,...,N with i!=j. It is well known in the literature that the Ising model P(s_1, s_2, ..., s_N) = 1 / Z * exp( sum{for all i}(h_i*s_i) + 0.5*sum{for all i!=j}(J_ij*s_i*s_j) ) maximizes entropy with the constraints given above (in fact, this is not the Ising model, because the Ising model assumes only nearest-neigbour interactions, and I have interactions with all other nodes, but I believe it is still true that the above P(s1,...sN) still maximizes entropy given the constraints above). What I would like is to solve the inverse problem of finding the h_i and J_ij which maximize entropy given my constraints. However, I would like to restrict the number of h_i and J_ij possible, since having complete freedom could become an unwieldly problem. For example, I could restrict h_i = H and J_ij = J for all i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j, and the J_ij=J3 for i=1,...,M and j=M+1,...N. If I understand correctly the discussion in the thread shown above, a numerical solution for the inverse problem would be: hi_{new}=hi_{old} + K * (<si> - <si>_{emp}) Jij_{new}=Jij_{old}+ K' * (<si*sj> - <si*sj>_{emp}) where K and K' are pos. "step size" constants. (On the RHS, <si> and <si*sj> are w.r.t. hi_{old} and Jij_{old}.) Have I understood all this correctly? In particular, for the case h_i = H and J_ij = J for all i,j=1,...N, i!=j could I simplify the previous algorithm by restricting the calculations only to say i=1 (i=2,...,N should be the same?), and for the case h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j simplify it by restricting the calculations only to say i=1 and i=M+1? Thank you for your help and sorry if I am new here and I have committed some "ettiquette" mistake. Jordi
![](https://secure.gravatar.com/avatar/b684c02bab6c8d54c0c25c4b69ee1135.jpg?s=120&d=mm&r=g)
On 7-Jan-10, at 4:19 AM, Jordi Molins Coronado wrote:
However, I would like to restrict the number of h_i and J_ij possible, since having complete freedom could become an unwieldly problem. For example, I could restrict h_i = H and J_ij = J for all i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j, and the J_ij=J3 for i=1,...,M and j=M +1,...N. If I understand correctly the discussion in the thread shown above, a numerical solution for the inverse problem would be: hi_{new}=hi_{old} + K * (<si> - <si>_{emp}) Jij_{new}=Jij_{old}+ K' * (<si*sj> - <si*sj>_{emp})
That's correct; the way that you'd usually calculate <si sj> is by starting from some state and running several iterations of Gibbs sampling to generate a new state, measure your s_i * s_j in that state, then run it for a whole bunch more steps and gather the s_i * s_j, etc. until you had enough measurements for a decent Monte Carlo approximation. The Gibbs iterations form a Markov chain whose equilibrium distribution is P(s_1, s_2, ... s_N), the distribution of interest; the problem is there's no good way to know when you've run sufficiently many Gibbs steps so that the sample you draw is from the equilibrium distribution P. However, one can often get away with just running a small fixed number of steps. There is some analysis of the convergence properties of this trick here: http://www.cs.toronto.edu/~hinton/absps/cdmiguel.pdf (refer to the sections on "Visible Boltzmann machines") I've never really heard of a situation where you'd really want to tie together parameters like you're suggesting, but it's possible and quite trivial to implement. Let's say you wanted to constrain hi and hj to be the same. Then you'd start them off at the same initial value and at every update, use the following equation instead: hi_{new} = hj_{new} = hi_{old} + K/2 (<si> - <si>_{emp}) + K/2 (<sj> - <sj>_{emp}) If you wanted Jij = Jkl, set them to the same initial value and use the update Jij_{new} = Jkl_{new} = Jij_{old}+ K'/2 * (<si*sj> - <si*sj>_{emp}) + K'/2 (<sk*sl> - <sk*sl>_{emp}) Similarly if you wanted to tie a whole set of these together you'd just average the updates and apply it to all of them at once. David
![](https://secure.gravatar.com/avatar/ac4e2152839c56c5326cd95ae54296e9.jpg?s=120&d=mm&r=g)
On Thu, Jan 7, 2010 at 9:19 AM, Jordi Molins Coronado <jordi_molins@hotmail.com> wrote:
Sorry, I see my previous message has been a disaster in formatting. I try now in a different way. Sorry for the inconveniences.
Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not-unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html). I would like to pose my problem (which is quite similar to the problem discussed in the thread above) and kindly ask you your opinion on that:
My space is a set of discrete nodes,s_i, where i=1,...,N, which can take two values, {0,1}. Empirically I have the following information: <s_i>_emp and <s_i*s_j>_emp, where i,j=1,...,N with i!=j.
It is well known in the literature that the Ising model
P(s_1, s_2, ..., s_N) = 1 / Z * exp( sum{for all i}(h_i*s_i) + 0.5*sum{for all i!=j}(J_ij*s_i*s_j) ) maximizes entropy with the constraints given above (in fact, this is not the Ising model, because the Ising model assumes only nearest-neigbour interactions, and I have interactions with all other nodes, but I believe it is still true that the above P(s1,...sN) still maximizes entropy given the constraints above). What I would like is to solve the inverse problem of finding the h_i and J_ij which maximize entropy given my constraints. However, I would like to restrict the number of h_i and J_ij possible, since having complete freedom could become an unwieldly problem. For example, I could restrict h_i = H and J_ij = J for all i,j=1,...N, i!=j, or I could have a partition of my nodes, say nodes from 1 to M having h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j, and the J_ij=J3 for i=1,...,M and j=M+1,...N. If I understand correctly the discussion in the thread shown above, a numerical solution for the inverse problem would be: hi_{new}=hi_{old} + K * (<si> - <si>_{emp}) Jij_{new}=Jij_{old}+ K' * (<si*sj> - <si*sj>_{emp})
where K and K' are pos. "step size" constants. (On the RHS, <si> and <si*sj> are w.r.t. hi_{old} and Jij_{old}.) Have I understood all this correctly? In particular, for the case h_i = H and J_ij = J for all i,j=1,...N, i!=j could I simplify the previous algorithm by restricting the calculations only to say i=1 (i=2,...,N should be the same?), and for the case h_i = H1 and J_ij=J1 i,j=1,...,M i!=j, and h_i=H2 and J_ij=J2 i,j=M+1,...,N i!=j simplify it by restricting the calculations only to say i=1 and i=M+1? Thank you for your help and sorry if I am new here and I have committed some "ettiquette" mistake.
Hi, I'm not so familiar with the statisitical mechanics notation, but you might be interested in the maxent module of the pyentropy package I have produced as part of my PhD: http://code.google.com/p/pyentropy/ The main purpose of pyentropy is calculation of bias corrected entropy and information values from limited data sets, but it includes the maxent module which computes maximum entropy distributions over finite alphabet spaces with marginal constraints of up to any order. (I am working in computational neuroscience so much of the notation will probably be a bit different). So I think a second order solution from this framework over a binary space is the same as the Ising model. You can get the hi and J's directly from the results (they are called theta in the code) - although I think they have a slightly different normalisation because of Ising being -1,1 and this being 0,1... http://pyentropy.googlecode.com/svn/docs/api.html#module-pyentropy.maxent With this on a normal computer I can solve for about 18 binary variables in a reasonable amount of time (ie less than an hour for several runs), but it becomes highly exponential with more vectors. (I have a much more efficient but more hackish version of the same algorithm that I haven't added to pyentropy yet, but will in the next few weeks). In the case you describe where the thetas are constrained to be equal at each order the system can be solved much more efficiently. I have code to do this which is not released (and a bit messy) but if you are interested I could send it to you. In neuroscience this situation is called the 'pooled model'. You can see a description for how to solve in this reduced case here: http://rsta.royalsocietypublishing.org/content/367/1901/3297.short The method is using information geomery from Amari - by transforming between the P space (probability vector), eta space (marginals) and theta space (the h,J's etc.) it is possible to find the maximum entropy solution as a projection. Anyway I'm not sure if that helps... probably the documentation on how to get the thetas and the ordering of the vector might not be so clear, so give me a shout if you start using it and have any questions. If you don't have the full probability distribution you can pass the eta vector to the solve function (which would be a vector of <si>, <si,sj> ) and set optional argument eta_given=True (hopefully clear from the option handling in the code). Cheers Robin
![](https://secure.gravatar.com/avatar/b0b1934457646c3498b8aecade7ccc9f.jpg?s=120&d=mm&r=g)
Jordi,
On Jan 7, 2010, at 1:09 AM, Jordi Molins Coronado wrote:
Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not- unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html ).
You might want to check out a recent method developed in our group, called "Minimum Probability Flow Learning" that allows very fast parameter estimation of basically any distribution -- including the Ising model. A 100 unit ising model can be fitted within about 1 minute (see Fig. 3). The paper is here: http://arxiv.org/abs/0906.4779 Kilian -- Kilian Koepsell, PhD Redwood Center for Theoretical Neuroscience Helen Wills Neuroscience Institute, UC Berkeley 156 Stanley Hall, MC# 3220 , Berkeley, CA 94720
![](https://secure.gravatar.com/avatar/91dc879a1f4f8a107be62f4e108002de.jpg?s=120&d=mm&r=g)
Hello, I find all the ideas posted in reply to my message very interesting, thank you very much to all who have answered to my question. Especially, I would like to know more about Kilian's and Robin's suggestions. In particular, I find difficult to understand and translate the ideas posted by them into my background. Of course, this is not Kilian's or Robin's fault, but my complete fault due to lack of knowledge. To Robin: - Is there a paper covering your package, but explained in layman's terms, not requiring previous knowledge on the subject? Or maybe a simple but fully-worked example (ideally closely related to the Ising model) that can be used in your package to see how everything works. To Kilian: - Do you have a computer package that covers that computations in your paper? Or do you have the Ising code available to distribution? I would be very interested to know more about the Ising implementation of your paper. Kind regards Jordi
CC: jordi_molins@hotmail.com From: koepsell@gmail.com To: scipy-user@scipy.org Subject: Re: [SciPy-User] [SciPy-user] Maximum entropy distribution for Ising model - setup? Date: Mon, 11 Jan 2010 18:02:53 -0800
Jordi,
On Jan 7, 2010, at 1:09 AM, Jordi Molins Coronado wrote:
Hello, I am new to this forum. I am looking for a numerical solution to the inverse problem of an Ising model (or a model not- unlike the Ising model, see below). I have seen an old discussion, but very interesting, about this subject on this forum (http://mail.scipy.org/pipermail/scipy-user/2006-October/009703.html ).
You might want to check out a recent method developed in our group, called "Minimum Probability Flow Learning" that allows very fast parameter estimation of basically any distribution -- including the Ising model. A 100 unit ising model can be fitted within about 1 minute (see Fig. 3). The paper is here: http://arxiv.org/abs/0906.4779
Kilian
-- Kilian Koepsell, PhD Redwood Center for Theoretical Neuroscience Helen Wills Neuroscience Institute, UC Berkeley 156 Stanley Hall, MC# 3220 , Berkeley, CA 94720
![](https://secure.gravatar.com/avatar/ac4e2152839c56c5326cd95ae54296e9.jpg?s=120&d=mm&r=g)
Hi, I apologise for the extended delay for this reply - I originally wrote it while I was on holiday, but there was something funny with the coupling/theta values in my example that I couldn't get to the bottom of in time and I'm afraid I forgot it about it... Now I come to ask the scipy list another question of my own I thought I should really first answer this! Hopefully it is still of some use. On Wed, Jan 13, 2010 at 3:42 AM, Jordi Molins Coronado <jordi_molins@hotmail.com> wrote:
Hello, I find all the ideas posted in reply to my message very interesting, thank you very much to all who have answered to my question. Especially, I would like to know more about Kilian's and Robin's suggestions. In particular, I find difficult to understand and translate the ideas posted by them into my background. Of course, this is not Kilian's or Robin's fault, but my complete fault due to lack of knowledge. To Robin: - Is there a paper covering your package, but explained in layman's terms, not requiring previous knowledge on the subject? Or maybe a simple but fully-worked example (ideally closely related to the Ising model) that can be used in your package to see how everything works.
The original paper describing the toolbox and the maximum entropy algorithm is here: http://www.frontiersin.org/neuroscience/neuroinformatics/paper/10.3389/neuro... but I suppose it is pretty focussed on our field. I hope the maximum entropy section might make sense on its own though. The algorithm is based on information geometry of Amari which is covered in this paper, which is a bit heavy going but not particularly field specific: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=930911 For an example: # start with 4 binary variables - 1000 trials X = np.random.random_integers(0,1,(4,1000)) # convert binary 4-vector to a decimal value for each trial import pyentropy X_d = pyentropy.decimalise(X,4,2) # sample full joint probability distribution (16 possible values) p = pyentropy.prob(X_d, 16) # this p vector is ordered so that P(x=(1,0,1,0)) is found at index with decimal #value of the word, ie p[1] = P(x=(0,0,0,1)), p[10] = P(x=(1,0,1,0)) # obtain maximum entropy solution from pyentropy.maxent import AmariSolve s = AmariSolve(4,2) # solve for maximum entropy distribution preserving up to second # order marginals ( = ising for binary variables) p_me = s.solve(p, k=2) # theta vector (which is h, J's in different notation) # this should be covered in the paper # this is length 15 (no zero entry) theta = s.theta_from_p(p_me) # theta[:4] are the first order thetas (the h's) # theta[4:10] are the second order thetas (the J's) (I would have to think # a bit harder to get the exact ordering if you are interested in that) # theta[10:] are the higher order terms which should be numerically zero (ie #10^-16 or thereabouts) # marginals eta = s.eta_from_p(p_me) # these are the corresponding ordered marginals # so from contraints (s.eta_from_p(p) - s.eta_from_p(p_me))[:10] is close # to zero (first and second order marignals are preserved) If you don't want to sample the full space you can provide just the vector of marginal values to the solve function... ie: s.solve(s.eta_from_p(p),k=2, eta_given=True) Or even marg_constraints = s.eta_from_p(p)[:10] s.solve(r_[marg_constraints,zeros(5)],k=2, eta_given=True) since the higher order marginals don't make any difference. (here marg_constraints would be a vector <S1>..<S4> then <SiSj> etc.) Please let me know how you get on. Of course the major limitation with this is that you can only do small populations of variables (up to 18/20 should be comfortable) - whereas the probabilistic flow method looks very promising for much larger populations. I would be really interested to see if the Ising model algorithm would be extendable to the more general finite alphabet, any order maximum entropy solution. In my work to date it hasn't been a practical problem though - since I am interested in entropy and information values, which are impossible to sample on much bigger spaces. For me the limitation has always been the available data rather than the computation time for the maximum entropy solutions. Cheers Robin
participants (4)
-
David Warde-Farley
-
Jordi Molins Coronado
-
Kilian Koepsell
-
Robin