how to efficiently build an array of x, y, z points
I'm reading a file which contains a grid definition. Each cell in the grid, apart from having an i,j,k index also has 8 x,y,z coordinates. I'm reading each set of coordinates into a numpy array. I then want to add/append those coordinates to what will be my large "points" array. Due to the orientation/order of the 8 corners of each hexahedral cell I may have to reorder them before adding them to my large points array (not sure about that yet). Should I create a numpy array with nothing in it and then .append to it? But this is probably expensive isn't it as it creates a new copy of the array each time? Or should I create a zero or empty array of sufficient size and then put each set of 8 coordinates into the correct position in that big array? I don't know exactly how big the array will be (some cells are inactive and therefore don't have a geometry defined) but I do know what its maximum size is (ni*nj*nk,3). Thanks Brennan
On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams < brennan.williams@visualreservoir.com> wrote:
I'm reading a file which contains a grid definition. Each cell in the grid, apart from having an i,j,k index also has 8 x,y,z coordinates. I'm reading each set of coordinates into a numpy array. I then want to add/append those coordinates to what will be my large "points" array. Due to the orientation/order of the 8 corners of each hexahedral cell I may have to reorder them before adding them to my large points array (not sure about that yet).
Should I create a numpy array with nothing in it and then .append to it? But this is probably expensive isn't it as it creates a new copy of the array each time?
Or should I create a zero or empty array of sufficient size and then put each set of 8 coordinates into the correct position in that big array?
I don't know exactly how big the array will be (some cells are inactive and therefore don't have a geometry defined) but I do know what its maximum size is (ni*nj*nk,3).
Someone will correct me if I'm wrong, but this problem - the "best" way to build a large array whose size is not known beforehand - came up in one of the tutorials at SciPyCon '09 and IIRC the answer was, perhaps surprisingly, build the thing as a Python list (which is optimized for this kind of indeterminate sequence building) and convert to a numpy array when you're done. Isn't that what was recommended, folks? DG
Thanks
Brennan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
David Goldsmith wrote:
On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams <brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com>> wrote:
I'm reading a file which contains a grid definition. Each cell in the grid, apart from having an i,j,k index also has 8 x,y,z coordinates. I'm reading each set of coordinates into a numpy array. I then want to add/append those coordinates to what will be my large "points" array. Due to the orientation/order of the 8 corners of each hexahedral cell I may have to reorder them before adding them to my large points array (not sure about that yet).
Should I create a numpy array with nothing in it and then .append to it? But this is probably expensive isn't it as it creates a new copy of the array each time?
Or should I create a zero or empty array of sufficient size and then put each set of 8 coordinates into the correct position in that big array?
I don't know exactly how big the array will be (some cells are inactive and therefore don't have a geometry defined) but I do know what its maximum size is (ni*nj*nk,3).
Someone will correct me if I'm wrong, but this problem - the "best" way to build a large array whose size is not known beforehand - came up in one of the tutorials at SciPyCon '09 and IIRC the answer was, perhaps surprisingly, build the thing as a Python list (which is optimized for this kind of indeterminate sequence building) and convert to a numpy array when you're done. Isn't that what was recommended, folks?
Build a list of floating point values, then convert to an array and shape accordingly? Or build a list of small arrays and then somehow convert that into a big numpy array? I've got 24 floating point values which I've got in an array of shape (8,3) but I could easily have those in a list rather than an array and then just keep appending each small list of values to a big list and then do the final conversion to the array - I'll try that and see how it goes. Brennan
DG
Thanks
Brennan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
------------------------------------------------------------------------
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Mar 2, 2010 at 6:59 PM, Brennan Williams < brennan.williams@visualreservoir.com> wrote:
David Goldsmith wrote:
On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams <brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com>> wrote:
I'm reading a file which contains a grid definition. Each cell in the grid, apart from having an i,j,k index also has 8 x,y,z coordinates. I'm reading each set of coordinates into a numpy array. I then want
to
add/append those coordinates to what will be my large "points" array. Due to the orientation/order of the 8 corners of each hexahedral cell I may have to reorder them before adding them to my large points array (not sure about that yet).
Should I create a numpy array with nothing in it and then .append to it? But this is probably expensive isn't it as it creates a new copy of the array each time?
Or should I create a zero or empty array of sufficient size and then put each set of 8 coordinates into the correct position in that big
array?
I don't know exactly how big the array will be (some cells are inactive and therefore don't have a geometry defined) but I do know what its maximum size is (ni*nj*nk,3).
Someone will correct me if I'm wrong, but this problem - the "best" way to build a large array whose size is not known beforehand - came up in one of the tutorials at SciPyCon '09 and IIRC the answer was, perhaps surprisingly, build the thing as a Python list (which is optimized for this kind of indeterminate sequence building) and convert to a numpy array when you're done. Isn't that what was recommended, folks?
Build a list of floating point values, then convert to an array and shape accordingly? Or build a list of small arrays and then somehow convert that into a big numpy array?
My guess is that either way will be better than iteratively "appending" to an existing array. I've got 24 floating point values which I've got in an array of shape
(8,3) but I could easily have those in a list rather than an array and then just keep appending each small list of values to a big list and then do the final conversion to the array - I'll try that and see how it goes.
Great! Be sure to report back. :-) Dg
Brennan
DG
Thanks
Brennan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
------------------------------------------------------------------------
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 03/02/2010 09:47 PM, David Goldsmith wrote:
On Tue, Mar 2, 2010 at 6:59 PM, Brennan Williams <brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com>> wrote:
David Goldsmith wrote: > > On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams > <brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com> > <mailto:brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com>>> wrote: > > I'm reading a file which contains a grid definition. Each cell in the > grid, apart from having an i,j,k index also has 8 x,y,z coordinates. > I'm reading each set of coordinates into a numpy array. I then want to > add/append those coordinates to what will be my large "points" array. > Due to the orientation/order of the 8 corners of each hexahedral > cell I > may have to reorder them before adding them to my large points array > (not sure about that yet). > > Should I create a numpy array with nothing in it and then .append > to it? > But this is probably expensive isn't it as it creates a new copy > of the > array each time? > > Or should I create a zero or empty array of sufficient size and > then put > each set of 8 coordinates into the correct position in that big array? > > I don't know exactly how big the array will be (some cells are > inactive > and therefore don't have a geometry defined) but I do know what its > maximum size is (ni*nj*nk,3). > > > Someone will correct me if I'm wrong, but this problem - the "best" > way to build a large array whose size is not known beforehand - came > up in one of the tutorials at SciPyCon '09 and IIRC the answer was, > perhaps surprisingly, build the thing as a Python list (which is > optimized for this kind of indeterminate sequence building) and > convert to a numpy array when you're done. Isn't that what was > recommended, folks? > Build a list of floating point values, then convert to an array and shape accordingly? Or build a list of small arrays and then somehow convert that into a big numpy array?
My guess is that either way will be better than iteratively "appending" to an existing array.
Hi, Christopher Barker provided some code last last year on appending ndarrays eg: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html A lot depends on your final usage of the array otherwise there are no suitable suggestions. That is do you need just to index the array using i, j, k indices (this gives you either an i by j by k array that contains the x, y, z coordinates) or do you also need to index the x, y, z coordinates as well (giving you an i by j by k by x by y by z array). If it is just plain storage then perhaps just a Python list, dict or sqlite object may be sufficient. There are also time and memory constraints as you can spend large effort just to get the input into a suitable format and memory usage. If you use a secondary storage like a Python list then you need memory to storage the list, the ndarray and all intermediate components and overheads. If you use scipy then you should look at using sparse arrays where space is only added as you need it. Bruce
Bruce Southey wrote:
On 03/02/2010 09:47 PM, David Goldsmith wrote:
On Tue, Mar 2, 2010 at 6:59 PM, Brennan Williams <brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com>> wrote:
David Goldsmith wrote: > > On Tue, Mar 2, 2010 at 6:29 PM, Brennan Williams > <brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com> > <mailto:brennan.williams@visualreservoir.com <mailto:brennan.williams@visualreservoir.com>>> wrote: > > I'm reading a file which contains a grid definition. Each cell in the > grid, apart from having an i,j,k index also has 8 x,y,z coordinates. > I'm reading each set of coordinates into a numpy array. I then want to > add/append those coordinates to what will be my large "points" array. > Due to the orientation/order of the 8 corners of each hexahedral > cell I > may have to reorder them before adding them to my large points array > (not sure about that yet). > > Should I create a numpy array with nothing in it and then .append > to it? > But this is probably expensive isn't it as it creates a new copy > of the > array each time? > > Or should I create a zero or empty array of sufficient size and > then put > each set of 8 coordinates into the correct position in that big array? > > I don't know exactly how big the array will be (some cells are > inactive > and therefore don't have a geometry defined) but I do know what its > maximum size is (ni*nj*nk,3). > > > Someone will correct me if I'm wrong, but this problem - the "best" > way to build a large array whose size is not known beforehand - came > up in one of the tutorials at SciPyCon '09 and IIRC the answer was, > perhaps surprisingly, build the thing as a Python list (which is > optimized for this kind of indeterminate sequence building) and > convert to a numpy array when you're done. Isn't that what was > recommended, folks? > Build a list of floating point values, then convert to an array and shape accordingly? Or build a list of small arrays and then somehow convert that into a big numpy array?
My guess is that either way will be better than iteratively "appending" to an existing array.
Hi, Christopher Barker provided some code last last year on appending ndarrays eg: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html
A lot depends on your final usage of the array otherwise there are no suitable suggestions. That is do you need just to index the array using i, j, k indices (this gives you either an i by j by k array that contains the x, y, z coordinates) or do you also need to index the x, y, z coordinates as well (giving you an i by j by k by x by y by z array). If it is just plain storage then perhaps just a Python list, dict or sqlite object may be sufficient.
Ultimately I'm trying to build a tvtk unstructured grid to view in a Traits/tvtk/Mayavi app. The grid is ni*nj*nk cells with 8 xyz's per cell (hexahedral cell with 6 faces). However some cells are inactive and therefore don't have geometry. Cells also have "connectivity" to other cells, usually to adjacent cells (e.g. cell i,j,k connected to cell i-1,j,k) but not always. I'll post more comments/questions as I go. Brennan
There are also time and memory constraints as you can spend large effort just to get the input into a suitable format and memory usage. If you use a secondary storage like a Python list then you need memory to storage the list, the ndarray and all intermediate components and overheads.
If you use scipy then you should look at using sparse arrays where space is only added as you need it.
Bruce ------------------------------------------------------------------------
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Bruce Southey wrote:
Christopher Barker provided some code last last year on appending ndarrays eg: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html
yup, I"d love someone else to pick that up and test/improve it. Anyway, that code only handles 1-d arrays, though that can be structured arrays. I"d like to extend it to handlw n-d arrays, though you could only grow them in the first dimension, which may work for your case. As for performance: My numpy code is a bit slower than using python lists, if you add elements one at a time, and the elements are a standard python data type. It should use less memory though, if that matters. If you add the data in big enough chunks, my method gets better performance.
Ultimately I'm trying to build a tvtk unstructured grid to view in a Traits/tvtk/Mayavi app.
I'd love to see that working, once you've got it!
The grid is ni*nj*nk cells with 8 xyz's per cell (hexahedral cell with 6 faces). However some cells are inactive and therefore don't have geometry. Cells also have "connectivity" to other cells, usually to adjacent cells (e.g. cell i,j,k connected to cell i-1,j,k) but not always.
I'm confused now -- what does the array need to look like in the end? Maybe: ni*nj*nk X 8 X 3 ? How is inactive indicated? Is the connectivity somehow in the same array, or is that stored separately? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
Bruce Southey wrote:
Christopher Barker provided some code last last year on appending ndarrays eg: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046634.html
yup, I"d love someone else to pick that up and test/improve it.
Anyway, that code only handles 1-d arrays, though that can be structured arrays. I"d like to extend it to handlw n-d arrays, though you could only grow them in the first dimension, which may work for your case.
As for performance:
My numpy code is a bit slower than using python lists, if you add elements one at a time, and the elements are a standard python data type. It should use less memory though, if that matters.
If you add the data in big enough chunks, my method gets better performance.
Currently I'm adding all my corner point xyz's into a list and then converting to an array of shape (npoints,3) And I'm creating a celllist with the point indices for each cell and then converting that into an array of shape (nactivecells,8) Then I'm creating an unstructured grid.
Ultimately I'm trying to build a tvtk unstructured grid to view in a Traits/tvtk/Mayavi app.
I'd love to see that working, once you've got it!
So will I.
The grid is ni*nj*nk cells with 8 xyz's per cell
(hexahedral cell with 6 faces). However some cells are inactive and therefore don't have geometry. Cells also have "connectivity" to other cells, usually to adjacent cells (e.g. cell i,j,k connected to cell i-1,j,k) but not always.
I'm confused now -- what does the array need to look like in the end? Maybe:
ni*nj*nk X 8 X 3 ?
How is inactive indicated?
I made a typo in my first posting. Each cell has 8 corners, each corner an x,y,z so yes, if all the cells in the grid are active then ni*nj*nk*8*3 but usually not all cells are active and it is optional whether to have inactive cell geometry written out to the grid file so it is actually nactivecells*8*3
Is the connectivity somehow in the same array, or is that stored separately?
Bit of both - there is separate connectivity info and also implicit connectivity info. Often a cell will be fully connected to its adjacent cell(s) as they share a common face. But also, often there is not connectivity (e.g. a fault) and the +I face of a cell does not match up against the -I face of the adjacent cell. At the moment, I'm not removing duplicate points (of which there are a lot, probably 25-50% depending on the degree of faulting). One other thing I need to do is to re-order my xyz coordinates - in the attached image taken from the VTK file format pdf you can see the 0,1,2,3 and 4,5,6,7 node ordering. In my grid it is 0,1,3,2 and 4,5,7,6 so you can see that I need to swap round some of the coordinates. I need to do this for each cell and there may be 10,000 of them but there may be 2,000,000 of them. So I think it is probably best not to do it on a cell by cell basis but wait until I've built my full pointlist, then convert it to an array, probably of shape (nactivecells,8,3) and then somehow rearrange/reorder the 8 "columns". Sound the right way to go? Brennan
participants (4)
-
Brennan Williams
-
Bruce Southey
-
Christopher Barker
-
David Goldsmith