
Hi all, Yi-Hao and I have been arguing in a pull request since this afternoon. I think we're having a tough time coming to an agreement about how to move this discussion forward so I thought I'd bring this discussion to the mailing list. For reference, this came up in PR 1710: https://github.com/yt-project/yt/pull/1710. I'm going to try to summarize the issue, my opinion, and YI-Hao's opinion. Yi-Hao, please let me know if you feel like I'm mischaracterizing your position. Our disagreement boils down to this test script: import yt import numpy as np arr = np.arange(8).reshape(4, 2, 1) data = dict(density=arr) ds = yt.load_uniform_grid(data, arr.shape) slc = ds.slice('z', 0.5) slc_frb = slc.to_frb(1, (4, 2)) dens_image = slc_frb['density'] print(dens_image.shape) print(dens_image) This script currently prints: (2, 4) [[ 0. 2. 4. 6.] [ 1. 3. 5. 7.]] g/cm**3 I think that the fact that the resolution argument of the to_frb call was (4, 2) means that I want an image with a shape (4, 2). But right now yt gives me an image with shape (2, 4). My pull request makes it so you get an image back with shape (4, 2). Yi-Hao correctly points out that the current behavior of yt gives a gives a pixelization that happens to exactly match the discretization of the data loaded into the yt dataset and he wants to keep that property. Unfortunately, with my pull request the same script would print: (4, 2) [[ 1. 5.] [ 1. 5.] [ 2. 6.] [ 2. 6.]] g/cm**3 So now the image's shape is correct, but the pixelization is no longer "natural" because this corresponds to 2 pixels along the x direction and 4 along y. I *can* get a "natural" pixelization if I tell yt to flip what it calls the "x" and "y" axes: ds.coordinates.x_axis[2] = 1 ds.coordinates.y_axis[2] = 0 If I add the above two lines to the script before calling to_frb, I get the following output: (4, 2) [[ 0. 1.] [ 2. 3.] [ 4. 5.] [ 6. 7.]] g/cm**3 This is again a "natural" pixelization because we have 4 pixels along y and 2 along x. However I don't think that's particularly useful since most people will want to make z-projections and slices with x plotted horizontally and y vertically. Unfortunately there's just a basic issue here with how to interpret the shape of an image geometrically on a plot. I have a feeling like Yi-Hao and I are a bit too close to this to resolve the issue either way. I'm hoping at least one person can weigh in with an opinion so we can find a way forward here. -Nathan

Hi all, Thanks Nathan for sending this email. What you wrote correctly summed up my argument. I would like to add something here. I think the fundamental inconsistency comes from the way that an array is plotted in matplotlib (maybe it is just imshow?). In yt, I always index an array by data[ix,iy]. So the first index corresponds to x-axis, and the second index corresponds to y-axis. Therefore, if I specify a 2-d array with shape (4, 2), I expect the array to have 4 pixels in x-axis and 2 pixels in y-axis. I think this behavior is what most people would expect. However, if I pass the (4, 2) array into imshow with origin="lower", the first axis is actually plotted vertically, i.e. 4 pixels vertically and 2 pixels horizontally. In order to match the common sense of most people that x-axis (first axis) should be plotted horizontally, yt transposes the ImageArray, which comes from FixedResolutionBuffer that is also transposed. Now we are trapped with two different conventions: raw_data[ix,iy] and FRB[iy,ix] ImageArray[iy,ix] When a user wants to access deeper data like the FRB, different conventions will be encountered and likely cause confusions. For example, if I specify size = (nx, ny) - the argument in PlotWindow.set_buff_size resolution = (nx, ny) - the argument in YTSelectionContainer2D.to_frb The resulting FRB and ImageArray will have the shape (ny, nx). This is what prompted the issue 1709 and Nathan's PR1710. And I agree that this is not a behavior that people would expect. I feel that what we should really do is to unify all the conventions, letting FRB and the ImageArray both be indexed by [ix, iy] and have the shape (nx, ny). Then, at the very end when the ImageArray is passed to imshow, we transpose it there (in visualization/base_plot_types. py:ImagePlotMPL). This would, of course, require a lot of work. But in my opinion, the unification would give us much more consistency throughout the yt code. Best, Yi-Hao On Thu, Mar 1, 2018 at 1:37 AM, Nathan Goldbaum <nathan12343@gmail.com> wrote:

Hi Yi-Hao and Nathan, I've thought about this a fair amount, and I think we should be careful about how we're thinking of the data. For instance, it's common practice in things like games and bitmaps to order things along a row, then the next row, then the row after that. If you look at (for instance) how sprites are stored in game engines, it's often done like this ( e.g., http://www.gamers.org/dhs/helpdocs/dmsp1666.html ); libpng even mandates having things stored in rows that get allocated via a series of pointers. So from the perspective of an *image*, indeed, the natural convention is [ row, column ] which would be roughly the same as [ y, x ]. But, like pointed out here, that's not natural because of how we think about the data itself. We are accustomed to think of x, y, and when something is returned as a YTArray, we'd expect it to follow that convention as well. Matplotlib, in imshow, pcolor and pcolormesh (see the "Grid Orientation" section of their docs) assumes that the first index is the row, the second is the column. It seems this may be because of its relationship to the MATLAB convention. My personal preference would be to have the FRBs return what we expect -- x and y in that order. And then, when we make an image (i.e., at the point we send the data to matplotlib) that we transpose it. This accomplishes two things: * Folks who access the data get what they expect from a *data* perspective * If they are generating data from other sources, it will follow the *same convention* for putting that data into matplotlib. As in, if I do an np.mgrid and feed that in without any changes, it's going to look different from how I expect. So now it's on me to read the docs for matplotlib, figure it out, etc. But, at the same time, this will be the same way that the data fed in from yt is processed as well. How ImageArray factors in I don't yet have a feeling for. If we were trying to be completely self-consistent, then I would see ImageArray as being an *image*, so exactly what you might get out if you read in something using PIL/Pillow -- rows, columns, but also starting at the top rather than the bottom. I'm not sure that is the right approach though, unless ImageArray is *also* 3 or 4 channeled, as we would be breaking the assumptions. -Matt On Thu, Mar 1, 2018 at 12:51 PM, Yi-Hao Chen <yihaochentw@gmail.com> wrote:

Hi all, Thanks Nathan for sending this email. What you wrote correctly summed up my argument. I would like to add something here. I think the fundamental inconsistency comes from the way that an array is plotted in matplotlib (maybe it is just imshow?). In yt, I always index an array by data[ix,iy]. So the first index corresponds to x-axis, and the second index corresponds to y-axis. Therefore, if I specify a 2-d array with shape (4, 2), I expect the array to have 4 pixels in x-axis and 2 pixels in y-axis. I think this behavior is what most people would expect. However, if I pass the (4, 2) array into imshow with origin="lower", the first axis is actually plotted vertically, i.e. 4 pixels vertically and 2 pixels horizontally. In order to match the common sense of most people that x-axis (first axis) should be plotted horizontally, yt transposes the ImageArray, which comes from FixedResolutionBuffer that is also transposed. Now we are trapped with two different conventions: raw_data[ix,iy] and FRB[iy,ix] ImageArray[iy,ix] When a user wants to access deeper data like the FRB, different conventions will be encountered and likely cause confusions. For example, if I specify size = (nx, ny) - the argument in PlotWindow.set_buff_size resolution = (nx, ny) - the argument in YTSelectionContainer2D.to_frb The resulting FRB and ImageArray will have the shape (ny, nx). This is what prompted the issue 1709 and Nathan's PR1710. And I agree that this is not a behavior that people would expect. I feel that what we should really do is to unify all the conventions, letting FRB and the ImageArray both be indexed by [ix, iy] and have the shape (nx, ny). Then, at the very end when the ImageArray is passed to imshow, we transpose it there (in visualization/base_plot_types. py:ImagePlotMPL). This would, of course, require a lot of work. But in my opinion, the unification would give us much more consistency throughout the yt code. Best, Yi-Hao On Thu, Mar 1, 2018 at 1:37 AM, Nathan Goldbaum <nathan12343@gmail.com> wrote:

Hi Yi-Hao and Nathan, I've thought about this a fair amount, and I think we should be careful about how we're thinking of the data. For instance, it's common practice in things like games and bitmaps to order things along a row, then the next row, then the row after that. If you look at (for instance) how sprites are stored in game engines, it's often done like this ( e.g., http://www.gamers.org/dhs/helpdocs/dmsp1666.html ); libpng even mandates having things stored in rows that get allocated via a series of pointers. So from the perspective of an *image*, indeed, the natural convention is [ row, column ] which would be roughly the same as [ y, x ]. But, like pointed out here, that's not natural because of how we think about the data itself. We are accustomed to think of x, y, and when something is returned as a YTArray, we'd expect it to follow that convention as well. Matplotlib, in imshow, pcolor and pcolormesh (see the "Grid Orientation" section of their docs) assumes that the first index is the row, the second is the column. It seems this may be because of its relationship to the MATLAB convention. My personal preference would be to have the FRBs return what we expect -- x and y in that order. And then, when we make an image (i.e., at the point we send the data to matplotlib) that we transpose it. This accomplishes two things: * Folks who access the data get what they expect from a *data* perspective * If they are generating data from other sources, it will follow the *same convention* for putting that data into matplotlib. As in, if I do an np.mgrid and feed that in without any changes, it's going to look different from how I expect. So now it's on me to read the docs for matplotlib, figure it out, etc. But, at the same time, this will be the same way that the data fed in from yt is processed as well. How ImageArray factors in I don't yet have a feeling for. If we were trying to be completely self-consistent, then I would see ImageArray as being an *image*, so exactly what you might get out if you read in something using PIL/Pillow -- rows, columns, but also starting at the top rather than the bottom. I'm not sure that is the right approach though, unless ImageArray is *also* 3 or 4 channeled, as we would be breaking the assumptions. -Matt On Thu, Mar 1, 2018 at 12:51 PM, Yi-Hao Chen <yihaochentw@gmail.com> wrote:
participants (3)
-
Matthew Turk
-
Nathan Goldbaum
-
Yi-Hao Chen