How to use pcolor and scatter plot in one image?
Hi, how can I use pcolor and a scatter plot in one image? I have a scatter plot where a lot of data points are so close to each other that they are drawn as (almost) one point in the scatter plot. So I'm trying to visualize which area of the scatter plot contains the most data points. Using pcolor I can draw a gird where each cell visualizes the relative number of data points by a different color. If I try to draw the scatter plot and the grid in the same image, only the scatter plot will be drawn. Regardless of the invocation order of plot and pcolor. ... plot(...) pcolor(...) show() ... pcolor(...) plot(...) show() Both return only the scatter plot. I'm new to Scipy. What am I doing wrong? Thanks in advance. kind regards robert
Hi Rob, could you give a little more details about what you're doing? If your problem is specifically a plot issue, you should rather write to the matplotlib mailing-list (http://sourceforge.net/mail/?group_id=80706). I tried to reproduce what you describe and as far as I'm concerned, I don't have any problem plotting on the same figure a scatter plot of the data and a grid of the coarsened density of points. See below for the code I used. Is it what you want to do? Cheers, Emmanuelle *** import numpy as np import pylab as pl N = 1000 n = 10 np.random.seed(3)#use always the same seed x, y = np.random.randn(2, N)/10 +0.5 X, Y = np.mgrid[0:1:n*1j, 0:1:n*1j] xfloor = X[:,0][np.floor(n*x).astype(int)] yfloor = Y[0][np.floor(n*y).astype(int)] z = xfloor + n*yfloor Z = X + n*Y histo = np.histogram(z.ravel(), bins=r_[Z.T.ravel(),2*n**2]) pl.pcolor(X-1./(2*n), Y-1./(2*n), histo[0].reshape((n,n))) #shifted to # have centered bins pl.scatter(x, y) pl.show() On Sun, May 31, 2009 at 12:59:23PM +0200, wierob wrote:
Hi,
how can I use pcolor and a scatter plot in one image?
I have a scatter plot where a lot of data points are so close to each other that they are drawn as (almost) one point in the scatter plot. So I'm trying to visualize which area of the scatter plot contains the most data points. Using pcolor I can draw a gird where each cell visualizes the relative number of data points by a different color.
If I try to draw the scatter plot and the grid in the same image, only the scatter plot will be drawn. Regardless of the invocation order of plot and pcolor.
... plot(...) pcolor(...) show()
... pcolor(...) plot(...) show()
Both return only the scatter plot.
I'm new to Scipy. What am I doing wrong?
Thanks in advance.
kind regards robert _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Oops sorry, there was a bug in my code as I interchanged x and y (this proves once again that tests should always be based on non-symmetric data!) Here is the corrected version Emmanuelle import numpy as np import pylab as pl N = 1000 n = 10 np.random.seed(3) x, y = np.random.randn(2, N)/10 +0.5 y -= 0.1 X, Y = np.mgrid[0:1:n*1j, 0:1:n*1j] xfloor = X[:,0][np.floor(n*x).astype(int)] yfloor = Y[0][np.floor(n*y).astype(int)] z = yfloor + n*xfloor Z = Y + n*X histo = np.histogram(z.ravel(), bins=r_[Z.ravel(),2*n**2]) pl.pcolor(X-1./(2*n), Y-1./(2*n), histo[0].reshape((n,n))) pl.scatter(x, y) show() On Sun, May 31, 2009 at 02:10:19PM +0200, Emmanuelle Gouillart wrote:
*** import numpy as np import pylab as pl
N = 1000 n = 10 np.random.seed(3)#use always the same seed x, y = np.random.randn(2, N)/10 +0.5 X, Y = np.mgrid[0:1:n*1j, 0:1:n*1j]
xfloor = X[:,0][np.floor(n*x).astype(int)] yfloor = Y[0][np.floor(n*y).astype(int)] z = xfloor + n*yfloor Z = X + n*Y histo = np.histogram(z.ravel(), bins=r_[Z.T.ravel(),2*n**2])
pl.pcolor(X-1./(2*n), Y-1./(2*n), histo[0].reshape((n,n))) #shifted to # have centered bins pl.scatter(x, y) pl.show()
On Sun, May 31, 2009 at 12:59:23PM +0200, wierob wrote:
Hi,
how can I use pcolor and a scatter plot in one image?
I have a scatter plot where a lot of data points are so close to each other that they are drawn as (almost) one point in the scatter plot. So I'm trying to visualize which area of the scatter plot contains the most data points. Using pcolor I can draw a gird where each cell visualizes the relative number of data points by a different color.
If I try to draw the scatter plot and the grid in the same image, only the scatter plot will be drawn. Regardless of the invocation order of plot and pcolor.
... plot(...) pcolor(...) show()
... pcolor(...) plot(...) show()
Both return only the scatter plot.
I'm new to Scipy. What am I doing wrong?
Thanks in advance.
kind regards robert _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
SciPy-user mailing list SciPy-user@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Hi, thanks for your help. Unfortunately, your example does not work for me. The line histo = np.histogram(z.ravel(), bins=r_[Z.ravel(),2*n**2]) produeces the following error message: Traceback (most recent call last): File "/mnt/VBoxShare/eg.py", line 15, in <module> histo = np.histogram(z.ravel(), bins=r_[Z.ravel(),2*n**2]) NameError: name 'r_' is not defined I'm very new to Scipy and have no idea what your intended to do there. What I'm trying to do is the following: from scipy import polyval, zeros import pylab a, b = fetch_data(...) pylab.plot(a, b, "g.") # scatter plot # regression line regression = regression_analysis(...) xr = polyval([regression[0], regression[1]], b) pylab.plot(b, xr, "r-") pylab.gca().set_xlim([0,max(b)]) pylab.gca().set_ylim([0,max(a)]) # calculate grid (10x10) xlim = pylab.gca().get_xlim()[1] ylim = pylab.gca().get_ylim()[1] block_x = int(xlim / 10.0 + 1) block_y = int(ylim / 10.0 + 1) grid_x = [ block_x * i for i in range(11) ] grid_y = [ block_y * i for i in range(11) ] density_map = zeros((10, 10)) # matrix for points per cell inc = 1.0 / number_of_data_points for i in range(10): for j in range(10): cell = [ grid_x[i], grid_x[i+1], grid_y[j], grid_y[j+1] ] density_map[j][i] += points_in(cell) * inc # plot the 'density map' pylab.pcolor(density_map, cmap=pylab.get_cmap("hot")) pylab.show() This only creates the scatter plot and the regression line. kind regards robert
On Mon, Jun 01, 2009 at 11:26:38AM +0200, wierob wrote:
thanks for your help. Unfortunately, your example does not work for me. The line
histo = np.histogram(z.ravel(), bins=r_[Z.ravel(),2*n**2])
produeces the following error message:
Traceback (most recent call last): File "/mnt/VBoxShare/eg.py", line 15, in <module> histo = np.histogram(z.ravel(), bins=r_[Z.ravel(),2*n**2]) NameError: name 'r_' is not defined
That was a typo: replace 'r_' by 'np.r_'. Gaël PS: IPython dev's: is there a pylab mode without the dreaded 'from pylab import *'. We need to advertise such a workflow, rather the 'ipython -pylab' which polutes the namespace with almost 900 entries.
Hi Robert,
histo = np.histogram(z.ravel(), bins=r_[Z.ravel(),2*n**2])
produeces the following error message:
Traceback (most recent call last): File "/mnt/VBoxShare/eg.py", line 15, in <module> histo = np.histogram(z.ravel(), bins=r_[Z.ravel(),2*n**2]) NameError: name 'r_' is not defined
As Gaël said, it should be np.r_ instead of r_. It's just that I executed my code into ipython -pylab that enables the interactive use of matplotlib, but also loads a lot of pylab and numpy features into the namespace. Sorry about the typo! You should try to execute the code again. I used np.r_ to concatenate the array Z.ravel() with 2*n**2 in order to add the upper_edge of the last bin for the histogram (note: if you don't use a recent version of numpy, histogram may return an error). I read rapidly your code below, here are a few comments: * I guess one of the problems might be that you're using two different scales for the data and for your grid. pcolor(density_map) plots the color levels corresponding to density_map on a y-scale (x-scale) between 0 and max_index_along_first_direction - 1 (that does not correspond to the values of the data). That may explain why your data and your density_map do not superpose. You should define X and Y coordinates of the grid (e.g. using np.mgrid as in my example) and plot pcolor(X, Y, density_map). * Rather than grid_x = [ block_x * i for i in range(11) ], use np.linspace(0, block_x*11, 11, endpoint=False) or np.arange(0, block_x*11, block_x) * You can avoid easily your for loop using numpy.histogram2d that does just what you want for putting your points inside the bins of the grid. Try np.histogram2d(a, b, bins=11, range=[[0, xlim], [0, ylim]]) (check the documentation first). Hope this helps, Emmanuelle PS: actually this discussion should rather be on the numpy-discussion list. I would advise you to suscribe to this list and -- if you have further questions -- post your reply on the numpy-discussion list instead.
I'm very new to Scipy and have no idea what your intended to do there.
What I'm trying to do is the following:
from scipy import polyval, zeros import pylab
a, b = fetch_data(...)
pylab.plot(a, b, "g.") # scatter plot
# regression line regression = regression_analysis(...) xr = polyval([regression[0], regression[1]], b) pylab.plot(b, xr, "r-")
pylab.gca().set_xlim([0,max(b)]) pylab.gca().set_ylim([0,max(a)])
# calculate grid (10x10)
xlim = pylab.gca().get_xlim()[1] ylim = pylab.gca().get_ylim()[1] block_x = int(xlim / 10.0 + 1) block_y = int(ylim / 10.0 + 1) grid_x = [ block_x * i for i in range(11) ] grid_y = [ block_y * i for i in range(11) ]
density_map = zeros((10, 10)) # matrix for points per cell
inc = 1.0 / number_of_data_points
for i in range(10): for j in range(10): cell = [ grid_x[i], grid_x[i+1], grid_y[j], grid_y[j+1] ] density_map[j][i] += points_in(cell) * inc
# plot the 'density map' pylab.pcolor(density_map, cmap=pylab.get_cmap("hot"))
pylab.show()
This only creates the scatter plot and the regression line.
kind regards robert
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (3)
-
Emmanuelle Gouillart
-
Gael Varoquaux
-
wierob