Entropy from empirical high-dimensional data
Hi list, I am looking at estimating entropy and conditional entropy from data for which I have only access to observations, and not the underlying probabilistic laws. With low dimensional data, I would simply use an empirical estimate of the probabilities by converting each observation to its quantile, and then apply the standard formula for entropy (for instance using scipy.stats.entropy). However, I have high-dimensional data (~100 features, and 30000 observations). Not only is it harder to convert observations to probabilities in the empirical law, but I am also worried of curse of dimensionality effects: density estimation in high-dimension is a difficult problem. Does anybody has advices, or code in Python to point to, for this task? Cheers, Gaël
Sorry for the noise, I sent this to the dev list, while it belongs to the user list. Hi list, I am looking at estimating entropy and conditional entropy from data for which I have only access to observations, and not the underlying probabilistic laws. With low dimensional data, I would simply use an empirical estimate of the probabilities by converting each observation to its quantile, and then apply the standard formula for entropy (for instance using scipy.stats.entropy). However, I have high-dimensional data (~100 features, and 30000 observations). Not only is it harder to convert observations to probabilities in the empirical law, but I am also worried of curse of dimensionality effects: density estimation in high-dimension is a difficult problem. Does anybody has advices, or code in Python to point to, for this task? Cheers, Gaël
Hi Gael, I recently played with a related problem which you might find of interest: (short paper) http://nilab.cimec.unitn.it/people/olivetti/work/prni2011/olivetti_bayes_err... (slides) http://nilab.cimec.unitn.it/people/olivetti/work/prni2011/olivetti_prni2011_... The proposed model can be used to estimate the posterior probability of information given observations and using classifiers. Note that these are just preliminary results. If this is of some help for you just let me know :-) I've recently talked to Stephen Strother about this topics and he pointed me to this paper: http://www.ncbi.nlm.nih.gov/pubmed/20533565 HTH, Emanuele On 05/25/2011 11:35 PM, Gael Varoquaux wrote:
Hi list,
I am looking at estimating entropy and conditional entropy from data for which I have only access to observations, and not the underlying probabilistic laws.
With low dimensional data, I would simply use an empirical estimate of the probabilities by converting each observation to its quantile, and then apply the standard formula for entropy (for instance using scipy.stats.entropy).
However, I have high-dimensional data (~100 features, and 30000 observations). Not only is it harder to convert observations to probabilities in the empirical law, but I am also worried of curse of dimensionality effects: density estimation in high-dimension is a difficult problem.
Does anybody has advices, or code in Python to point to, for this task?
Cheers,
Gaël _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, May 26, 2011 at 10:17:16AM +0200, Emanuele Olivetti wrote:
I recently played with a related problem which you might find of interest:
(slides) http://nilab.cimec.unitn.it/people/olivetti/work/prni2011/olivetti_prni2011_...
Very interesting. This is quite unrelated to what I am doing right now, but it is very interesting in general.
I've recently talked to Stephen Strother about this topics and he pointed me to this paper: http://www.ncbi.nlm.nih.gov/pubmed/20533565
I saw Stephen at NIPS and we did discuss these matters. All this is indeed promising. Thanks for the pointers, Gaël
participants (2)
-
Emanuele Olivetti
-
Gael Varoquaux