numpy.append & numpy.where vs list.append and brute iterative for loop
I am processing two csv files against another, my first implementation used python list of lists and list.append to generate a new list while looping all the data including the non-relevant data (can't determine location of specific data element in a list of list). So I re-implented the exact same code but using numpy.array's (2d arrays) using numpy.where to prevent looping over an entire dataset needlessly but the numpy.array based code is about 7.6 times slower? relevant list of list code:
starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = [] for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new list if NIroom != '': #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_room = row[0] if len(eqp_room) == 5: eqp_drawing = row[1] if NIroom == eqp_room: newrow = [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_room = prow[0] if len(pipe_room) == 5: pipe_drawing = prow[1] if pipe_room == NIroom: piperow = [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] NI_data_list_new.append(piperow) #Write appropriate equipment models and drawing into new list if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_locater = row[0] if len(eqp_locater) == 4: eqp_drawing = row[1] if locater == eqp_locater: newrow = [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_locater = prow[0] if len(pipe_locater) == 4: pipe_drawing = prow[1] if pipe_locater == locater: piperow = [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] NI_data_list_new.append(piperow) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new.append(NI_row)
print (time.clock()-starttime)
relevant numpy.array code:
NI_data_write_url = reports_dir + 'NI_data_room2.csv' NI_data_list_file = open(NI_data_write_url, 'wb') NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', quotechar='"') starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', 'BuildingLocater', 'Area', 'Room', 'Item']]) for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new array if NIroom != '': #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) for row_iter in rowtest: eqp_room = room_eqp_list[row_iter,0] if len(eqp_room) == 5: eqp_drawing = room_eqp_list[row_iter,1] if NIroom == eqp_room: newrow = numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0)
#Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_piping_list==NIroom) for row_iter in rowtest: #unique_room_piping_list pipe_room = unique_room_piping_list[row_iter,0] if len(pipe_room) == 5: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_room == NIroom: piperow = numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Write appropriate equipment models and drawing into new array if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==locater) for row_iter in rowtest: eqp_locater = room_eqp_list[row_iter,0] if len(eqp_locater) == 4: eqp_drawing = room_eqp_list[row_iter,1] if locater == eqp_locater: newrow = numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0) #Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_eqp_list==locater) for row_iter in rowtest: pipe_locater = unique_room_piping_list[row_iter,0] if len(pipe_locater) == 4: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_locater == locater: piperow = numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0)
print (time.clock()-starttime)
some relevant output
print NI_data_list_new [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] ['0' '' '1000' '' '' ''] ['1' '' '1000' '' '' 'docname Rev 0'] ..., ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] ['0' '' '' '' '' '']]
Is numpy.append so slow? or is the culprit numpy.where? Dewald Pieterse "A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
On 1/27/11 1:03 PM, Dewald Pieterse wrote:
I am processing two csv files against another, my first implementation used python list of lists and list.append to generate a new list while looping all the data including the non-relevant data (can't determine location of specific data element in a list of list). So I re-implented the exact same code but using numpy.array's (2d arrays) using numpy.where to prevent looping over an entire dataset needlessly but the numpy.array based code is about 7.6 times slower?
Didn't look at your code in any detail, but: numpy arrays are not designed to be re-sizable, so numpy.append actually creates a new array, and copies the old to the new, along with the new stuff. It's a convenience function, but it means you are re-allocating and copying all your data with each call. python lists, on the other hand, are designed to be re-sizable, so they pre-allocate extra room, so that appending can be fast. In general, the recommended solution in this sort of situation is to build up your data in a python list, then convert it to an array. If I'm right about what you're doing you could keep the "rows" as numpy arrays, but put them in a list while building it up. Also, a numpy array of strings isn't necessarily a great dats structure for this kind of data. YOu might want to look at structured arrays. I wrote an appendable numpy array class a while back, to address this. It has some advantages, though, as it it written, not as much as you'd think. It does have some benifits for structured arrays, though. Code enclosed -Chris
relevant list of list code:
starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = [] for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new list if NIroom != '': #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_room = row[0] if len(eqp_room) == 5: eqp_drawing = row[1] if NIroom == eqp_room: newrow = [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_room = prow[0] if len(pipe_room) == 5: pipe_drawing = prow[1] if pipe_room == NIroom: piperow = [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] NI_data_list_new.append(piperow) #Write appropriate equipment models and drawing into new list if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_locater = row[0] if len(eqp_locater) == 4: eqp_drawing = row[1] if locater == eqp_locater: newrow = [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_locater = prow[0] if len(pipe_locater) == 4: pipe_drawing = prow[1] if pipe_locater == locater: piperow = [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] NI_data_list_new.append(piperow) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new.append(NI_row)
print (time.clock()-starttime)
relevant numpy.array code:
NI_data_write_url = reports_dir + 'NI_data_room2.csv' NI_data_list_file = open(NI_data_write_url, 'wb') NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', quotechar='"') starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', 'BuildingLocater', 'Area', 'Room', 'Item']]) for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new array if NIroom != '': #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) for row_iter in rowtest: eqp_room = room_eqp_list[row_iter,0] if len(eqp_room) == 5: eqp_drawing = room_eqp_list[row_iter,1] if NIroom == eqp_room: newrow = numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0)
#Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_piping_list==NIroom) for row_iter in rowtest: #unique_room_piping_list pipe_room = unique_room_piping_list[row_iter,0] if len(pipe_room) == 5: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_room == NIroom: piperow = numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Write appropriate equipment models and drawing into new array if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==locater) for row_iter in rowtest: eqp_locater = room_eqp_list[row_iter,0] if len(eqp_locater) == 4: eqp_drawing = room_eqp_list[row_iter,1] if locater == eqp_locater: newrow = numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0) #Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_eqp_list==locater) for row_iter in rowtest: pipe_locater = unique_room_piping_list[row_iter,0] if len(pipe_locater) == 4: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_locater == locater: piperow = numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0)
print (time.clock()-starttime)
some relevant output
>>> print NI_data_list_new [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] ['0' '' '1000' '' '' ''] ['1' '' '1000' '' '' 'docname Rev 0'] ..., ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] ['0' '' '' '' '' '']]
Is numpy.append so slow? or is the culprit numpy.where?
Dewald Pieterse
"A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Jan 27, 2011 at 4:19 PM, Christopher Barker <Chris.Barker@noaa.gov>wrote:
On 1/27/11 1:03 PM, Dewald Pieterse wrote:
I am processing two csv files against another, my first implementation used python list of lists and list.append to generate a new list while looping all the data including the non-relevant data (can't determine location of specific data element in a list of list). So I re-implented the exact same code but using numpy.array's (2d arrays) using numpy.where to prevent looping over an entire dataset needlessly but the numpy.array based code is about 7.6 times slower?
Didn't look at your code in any detail, but:
numpy arrays are not designed to be re-sizable, so numpy.append actually creates a new array, and copies the old to the new, along with the new stuff. It's a convenience function, but it means you are re-allocating and copying all your data with each call.
python lists, on the other hand, are designed to be re-sizable, so they pre-allocate extra room, so that appending can be fast.
In general, the recommended solution in this sort of situation is to build up your data in a python list, then convert it to an array.
If I'm right about what you're doing you could keep the "rows" as numpy arrays, but put them in a list while building it up.
Thanks Chris, I believe this is the problem then, I can continue to use the arrays as reference data but build list instead, the only reason I used the arrays was to be able to use numpy.where, I just use both data types, best of both worlds. As I already have row arrays I will do a build a list or arrays.
Also, a numpy array of strings isn't necessarily a great dats structure for this kind of data. YOu might want to look at structured arrays.
Atm, I use : comit_eqp_reader = csv.reader(comit_eqp_file, delimiter=',', quotechar='"') comit_eqp_lt = numpy.array([[col for col in row] for row in comit_eqp_reader]) to setup the arrays, I will look at using structured arrays
I wrote an appendable numpy array class a while back, to address this. It has some advantages, though, as it it written, not as much as you'd think. It does have some benifits for structured arrays, though.
Code enclosed
-Chris
relevant list of list code:
starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = [] for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new list if NIroom != '': #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_room = row[0] if len(eqp_room) == 5: eqp_drawing = row[1] if NIroom == eqp_room: newrow = [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_room = prow[0] if len(pipe_room) == 5: pipe_drawing = prow[1] if pipe_room == NIroom: piperow = [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] NI_data_list_new.append(piperow) #Write appropriate equipment models and drawing into new list if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_locater = row[0] if len(eqp_locater) == 4: eqp_drawing = row[1] if locater == eqp_locater: newrow = [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_locater = prow[0] if len(pipe_locater) == 4: pipe_drawing = prow[1] if pipe_locater == locater: piperow = [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] NI_data_list_new.append(piperow) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new.append(NI_row)
print (time.clock()-starttime)
relevant numpy.array code:
NI_data_write_url = reports_dir + 'NI_data_room2.csv' NI_data_list_file = open(NI_data_write_url, 'wb') NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', quotechar='"') starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', 'BuildingLocater', 'Area', 'Room', 'Item']]) for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new array if NIroom != '': #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) for row_iter in rowtest: eqp_room = room_eqp_list[row_iter,0] if len(eqp_room) == 5: eqp_drawing = room_eqp_list[row_iter,1] if NIroom == eqp_room: newrow =
numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0)
#Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_piping_list==NIroom) for row_iter in rowtest: #unique_room_piping_list pipe_room = unique_room_piping_list[row_iter,0] if len(pipe_room) == 5: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_room == NIroom: piperow =
numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Write appropriate equipment models and drawing into new array if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==locater) for row_iter in rowtest: eqp_locater = room_eqp_list[row_iter,0] if len(eqp_locater) == 4: eqp_drawing = room_eqp_list[row_iter,1] if locater == eqp_locater: newrow =
numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0) #Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_eqp_list==locater) for row_iter in rowtest: pipe_locater = unique_room_piping_list[row_iter,0] if len(pipe_locater) == 4: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_locater == locater: piperow =
numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0)
print (time.clock()-starttime)
some relevant output
>>> print NI_data_list_new [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] ['0' '' '1000' '' '' ''] ['1' '' '1000' '' '' 'docname Rev 0'] ..., ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] ['0' '' '' '' '' '']]
Is numpy.append so slow? or is the culprit numpy.where?
Dewald Pieterse
"A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Dewald Pieterse "A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
On Thu, Jan 27, 2011 at 4:33 PM, Dewald Pieterse <dewald.pieterse@gmail.com>wrote:
On Thu, Jan 27, 2011 at 4:19 PM, Christopher Barker <Chris.Barker@noaa.gov
wrote:
On 1/27/11 1:03 PM, Dewald Pieterse wrote:
I am processing two csv files against another, my first implementation used python list of lists and list.append to generate a new list while looping all the data including the non-relevant data (can't determine location of specific data element in a list of list). So I re-implented the exact same code but using numpy.array's (2d arrays) using numpy.where to prevent looping over an entire dataset needlessly but the numpy.array based code is about 7.6 times slower?
Didn't look at your code in any detail, but:
numpy arrays are not designed to be re-sizable, so numpy.append actually creates a new array, and copies the old to the new, along with the new stuff. It's a convenience function, but it means you are re-allocating and copying all your data with each call.
python lists, on the other hand, are designed to be re-sizable, so they pre-allocate extra room, so that appending can be fast.
In general, the recommended solution in this sort of situation is to build up your data in a python list, then convert it to an array.
If I'm right about what you're doing you could keep the "rows" as numpy arrays, but put them in a list while building it up.
Thanks Chris, I believe this is the problem then, I can continue to use the arrays as reference data but build list instead, the only reason I used the arrays was to be able to use numpy.where, I just use both data types, best of both worlds. As I already have row arrays I will do a build a list or arrays.
Now my code is nearly 4 times faster than the list of lists implementation! Wonderful, thanks.
Also, a numpy array of strings isn't necessarily a great dats structure for this kind of data. YOu might want to look at structured arrays.
Atm, I use : comit_eqp_reader = csv.reader(comit_eqp_file, delimiter=',', quotechar='"') comit_eqp_lt = numpy.array([[col for col in row] for row in comit_eqp_reader]) to setup the arrays, I will look at using structured arrays
I wrote an appendable numpy array class a while back, to address this. It has some advantages, though, as it it written, not as much as you'd think. It does have some benifits for structured arrays, though.
Code enclosed
-Chris
relevant list of list code:
starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = [] for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new list if NIroom != '': #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_room = row[0] if len(eqp_room) == 5: eqp_drawing = row[1] if NIroom == eqp_room: newrow = [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_room = prow[0] if len(pipe_room) == 5: pipe_drawing = prow[1] if pipe_room == NIroom: piperow = [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] NI_data_list_new.append(piperow) #Write appropriate equipment models and drawing into new list if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new list for row in room_eqp_list: eqp_locater = row[0] if len(eqp_locater) == 4: eqp_drawing = row[1] if locater == eqp_locater: newrow = [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] NI_data_list_new.append(newrow) #Write appropriate piping info into the new list for prow in unique_piping_list: pipe_locater = prow[0] if len(pipe_locater) == 4: pipe_drawing = prow[1] if pipe_locater == locater: piperow = [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] NI_data_list_new.append(piperow) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new.append(NI_row)
print (time.clock()-starttime)
relevant numpy.array code:
NI_data_write_url = reports_dir + 'NI_data_room2.csv' NI_data_list_file = open(NI_data_write_url, 'wb') NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', quotechar='"') starttime = time.clock() #NI_data_list room_eqp_list NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', 'BuildingLocater', 'Area', 'Room', 'Item']]) for NI_row in NI_data_list: treelevel = NI_row[0] elevation = NI_row[1] locater = NI_row[2] area = NI_row[3] NIroom = NI_row[4] #Write appropriate equipment models and drawing into new array if NIroom != '': #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) for row_iter in rowtest: eqp_room = room_eqp_list[row_iter,0] if len(eqp_room) == 5: eqp_drawing = room_eqp_list[row_iter,1] if NIroom == eqp_room: newrow =
numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0)
#Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_piping_list==NIroom) for row_iter in rowtest: #unique_room_piping_list pipe_room = unique_room_piping_list[row_iter,0] if len(pipe_room) == 5: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_room == NIroom: piperow =
numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Write appropriate equipment models and drawing into new array if (locater != '' and NIroom == ''): #Write appropriate equipment models and drawing into new array (rowtest, columntest) = numpy.where(room_eqp_list==locater) for row_iter in rowtest: eqp_locater = room_eqp_list[row_iter,0] if len(eqp_locater) == 4: eqp_drawing = room_eqp_list[row_iter,1] if locater == eqp_locater: newrow =
numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, newrow, 0) #Write appropriate piping info into the new array (rowtest, columntest) = numpy.where(unique_room_eqp_list==locater) for row_iter in rowtest: pipe_locater = unique_room_piping_list[row_iter,0] if len(pipe_locater) == 4: pipe_drawing = unique_room_piping_list[row_iter,1] if pipe_locater == locater: piperow =
numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) NI_data_list_new = numpy.append(NI_data_list_new, piperow, 0) #Rewrite NI_data to new list if NIroom == '': NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0)
print (time.clock()-starttime)
some relevant output
>>> print NI_data_list_new [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] ['0' '' '1000' '' '' ''] ['1' '' '1000' '' '' 'docname Rev 0'] ..., ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] ['0' '' '' '' '' '']]
Is numpy.append so slow? or is the culprit numpy.where?
Dewald Pieterse
"A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Dewald Pieterse
"A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
-- Dewald Pieterse "A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
On 1/27/11 1:53 PM, Sturla Molden wrote:
But N appends are O(N) for lists and O(N*N) for arrays.
hmmm - that doesn't seem quite right -- lists still have to re-allocate and copy, they just do it every n times (where n grows with the list), so I wouldn't expect exactly O(N). But you never know 'till you profile. See the enclosed code and figures. Interestingly both appear to be pretty linear, though the constant is Much larger for numpy arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Den 28.01.2011 00:33, skrev Christopher Barker:
hmmm - that doesn't seem quite right -- lists still have to re-allocate and copy, they just do it every n times (where n grows with the list), so I wouldn't expect exactly O(N).
Lists allocate empty slots at their back, proportional to their size. So as lists grows, re-allocations become rarer and rarer. Then on average the complexity per append becomes O(1), which is the "amortised" complexity. Appending N items to a list thus has the amortized complexity O(N). The advantage of this implementation over linked lists is that indexing will be O(1) as well. NumPy arrays are designed to be fixed size, and not designed to amortize the complexity of appends. So if you want to use arrays as efficient re-sizeable containers, you must code this logic yourself. Sturla
On 1/27/11 3:54 PM, Sturla Molden wrote:
Lists allocate empty slots at their back, proportional to their size. So as lists grows, re-allocations become rarer and rarer. Then on average the complexity per append becomes O(1), which is the "amortised" complexity. Appending N items to a list thus has the amortized complexity O(N).
I think I get that now...
NumPy arrays are designed to be fixed size, and not designed to amortize the complexity of appends. So if you want to use arrays as efficient re-sizeable containers, you must code this logic yourself.
And I do get that. And yet, experimentally, appending numpy arrays (on that one simple example) appeared to be O(N). Granted, a much larger constant that for lists, but it sure looks linear to me. Should it be O(N^2)? Maybe I need to run it for larger N , but I got impatient as it is. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (3)
-
Christopher Barker
-
Dewald Pieterse
-
Sturla Molden