[Numpy-discussion] numpy.append & numpy.where vs list.append and brute iterative for loop

Dewald Pieterse dewald.pieterse at gmail.com
Thu Jan 27 16:47:43 EST 2011


On Thu, Jan 27, 2011 at 4:33 PM, Dewald Pieterse
<dewald.pieterse at gmail.com>wrote:

>
>
> On Thu, Jan 27, 2011 at 4:19 PM, Christopher Barker <Chris.Barker at noaa.gov
> > wrote:
>
>> On 1/27/11 1:03 PM, Dewald Pieterse wrote:
>>
>>> I am processing two csv files against another, my first implementation
>>> used python list of lists and list.append to generate a new list while
>>> looping all the data including the non-relevant data (can't determine
>>> location of specific data element in a list of list). So I re-implented
>>> the exact same code but using numpy.array's (2d arrays) using
>>> numpy.where to prevent looping over an entire dataset needlessly but the
>>> numpy.array based code is about 7.6 times slower?
>>>
>>
>> Didn't look at your code in any detail, but:
>>
>> numpy arrays are not designed to be re-sizable, so numpy.append actually
>> creates a new array, and copies the old to the new, along with the new
>> stuff. It's a convenience function, but it means you are re-allocating and
>> copying all your data with each call.
>>
>> python lists, on the other hand, are designed to be re-sizable, so they
>> pre-allocate extra room, so that appending can be fast.
>>
>> In general, the recommended solution in this sort of situation is to build
>> up your data in a python list, then convert it to an array.
>>
>> If I'm right about what you're doing you could keep the "rows" as numpy
>> arrays, but put them in a list while building it up.
>>
>
> Thanks Chris, I believe this is the problem then, I can continue to use the
> arrays as reference data but build list instead, the only reason I used the
> arrays was to be able to use numpy.where, I just use both data types, best
> of both worlds. As I already have row arrays I will do a build a list or
> arrays.
>

Now my code is nearly 4 times faster than the list of lists implementation!
Wonderful, thanks.

>
>> Also, a numpy array of strings isn't necessarily a great dats structure
>> for this kind of data. YOu might want to look at structured arrays.
>>
>
> Atm, I use :
> comit_eqp_reader = csv.reader(comit_eqp_file, delimiter=',', quotechar='"')
> comit_eqp_lt = numpy.array([[col for col in row] for row in
> comit_eqp_reader])
> to setup the arrays, I will look at using structured arrays
>
>>
>> I wrote an appendable numpy array class a while back, to address this. It
>> has some advantages, though, as it it written, not as much as you'd think.
>> It does have some benifits for structured arrays, though.
>>
>>
>> Code enclosed
>>
>> -Chris
>>
>>
>>
>>  relevant list of list code:
>>>
>>>    starttime = time.clock()
>>>    #NI_data_list room_eqp_list
>>>    NI_data_list_new = []
>>>    for NI_row in NI_data_list:
>>>         treelevel = NI_row[0]
>>>         elevation = NI_row[1]
>>>         locater = NI_row[2]
>>>         area = NI_row[3]
>>>         NIroom = NI_row[4]
>>>         #Write appropriate equipment models and drawing into new list
>>>         if NIroom != '':
>>>             #Write appropriate equipment models and drawing into new list
>>>             for row in room_eqp_list:
>>>                 eqp_room = row[0]
>>>                 if len(eqp_room) == 5:
>>>                     eqp_drawing = row[1]
>>>                     if NIroom == eqp_room:
>>>                         newrow =
>>>    [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]
>>>                         NI_data_list_new.append(newrow)
>>>             #Write appropriate piping info into the new list
>>>             for prow in unique_piping_list:
>>>                 pipe_room = prow[0]
>>>                 if len(pipe_room) == 5:
>>>                     pipe_drawing = prow[1]
>>>                     if pipe_room == NIroom:
>>>                         piperow =
>>>    [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]
>>>                         NI_data_list_new.append(piperow)
>>>         #Write appropriate equipment models and drawing into new list
>>>         if (locater != '' and NIroom == ''):
>>>             #Write appropriate equipment models and drawing into new list
>>>             for row in room_eqp_list:
>>>                 eqp_locater = row[0]
>>>                 if len(eqp_locater) == 4:
>>>                     eqp_drawing = row[1]
>>>                     if locater == eqp_locater:
>>>                         newrow =
>>>    [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]
>>>                         NI_data_list_new.append(newrow)
>>>             #Write appropriate piping info into the new list
>>>             for prow in unique_piping_list:
>>>                 pipe_locater = prow[0]
>>>                 if len(pipe_locater) == 4:
>>>                     pipe_drawing = prow[1]
>>>                     if pipe_locater == locater:
>>>                         piperow =
>>>    [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]
>>>                         NI_data_list_new.append(piperow)
>>>         #Rewrite NI_data to new list
>>>         if NIroom == '':
>>>             NI_data_list_new.append(NI_row)
>>>
>>>    print (time.clock()-starttime)
>>>
>>>
>>> relevant numpy.array code:
>>>
>>>    NI_data_write_url = reports_dir + 'NI_data_room2.csv'
>>>    NI_data_list_file = open(NI_data_write_url, 'wb')
>>>    NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',',
>>>    quotechar='"')
>>>    starttime = time.clock()
>>>    #NI_data_list room_eqp_list
>>>    NI_data_list_new = numpy.array([['TreeDepth', 'Elevation',
>>>    'BuildingLocater', 'Area', 'Room', 'Item']])
>>>    for NI_row in NI_data_list:
>>>         treelevel = NI_row[0]
>>>         elevation = NI_row[1]
>>>         locater = NI_row[2]
>>>         area = NI_row[3]
>>>         NIroom = NI_row[4]
>>>         #Write appropriate equipment models and drawing into new array
>>>         if NIroom != '':
>>>             #Write appropriate equipment models and drawing into new
>>> array
>>>             (rowtest, columntest) = numpy.where(room_eqp_list==NIroom)
>>>             for row_iter in rowtest:
>>>                 eqp_room = room_eqp_list[row_iter,0]
>>>                 if len(eqp_room) == 5:
>>>                     eqp_drawing = room_eqp_list[row_iter,1]
>>>                     if NIroom == eqp_room:
>>>                         newrow =
>>>
>>>  numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]])
>>>                         NI_data_list_new =
>>>    numpy.append(NI_data_list_new, newrow, 0)
>>>
>>>             #Write appropriate piping info into the new array
>>>             (rowtest, columntest) =
>>>    numpy.where(unique_room_piping_list==NIroom)
>>>             for row_iter in rowtest: #unique_room_piping_list
>>>                 pipe_room = unique_room_piping_list[row_iter,0]
>>>                 if len(pipe_room) == 5:
>>>                     pipe_drawing = unique_room_piping_list[row_iter,1]
>>>                     if pipe_room == NIroom:
>>>                         piperow =
>>>
>>>  numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]])
>>>                         NI_data_list_new =
>>>    numpy.append(NI_data_list_new, piperow, 0)
>>>         #Write appropriate equipment models and drawing into new array
>>>         if (locater != '' and NIroom == ''):
>>>             #Write appropriate equipment models and drawing into new
>>> array
>>>             (rowtest, columntest) = numpy.where(room_eqp_list==locater)
>>>             for row_iter in rowtest:
>>>                 eqp_locater = room_eqp_list[row_iter,0]
>>>                 if len(eqp_locater) == 4:
>>>                     eqp_drawing = room_eqp_list[row_iter,1]
>>>                     if locater == eqp_locater:
>>>                         newrow =
>>>
>>>  numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]])
>>>                         NI_data_list_new =
>>>    numpy.append(NI_data_list_new, newrow, 0)
>>>             #Write appropriate piping info into the new array
>>>             (rowtest, columntest) =
>>>    numpy.where(unique_room_eqp_list==locater)
>>>             for row_iter in rowtest:
>>>                 pipe_locater = unique_room_piping_list[row_iter,0]
>>>                 if len(pipe_locater) == 4:
>>>                     pipe_drawing = unique_room_piping_list[row_iter,1]
>>>                     if pipe_locater == locater:
>>>                         piperow =
>>>
>>>  numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]])
>>>                         NI_data_list_new =
>>>    numpy.append(NI_data_list_new, piperow, 0)
>>>         #Rewrite NI_data to new list
>>>         if NIroom == '':
>>>             NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0)
>>>
>>>    print (time.clock()-starttime)
>>>
>>>
>>> some relevant output
>>>
>>>     >>> print NI_data_list_new
>>>    [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item']
>>>      ['0' '' '1000' '' '' '']
>>>      ['1' '' '1000' '' '' 'docname Rev 0']
>>>      ...,
>>>      ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08']
>>>      ['4' '6' '1164' '4' '' 'anotherdoc Rev A']
>>>      ['0' '' '' '' '' '']]
>>>
>>>
>>> Is numpy.append so slow? or is the culprit numpy.where?
>>>
>>> Dewald Pieterse
>>>
>>> "A democracy is nothing more than mob rule, where fifty-one percent of
>>> the people take away the rights of the other forty-nine." ~ Thomas
>>> Jefferson
>>>
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.Barker at noaa.gov
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> Dewald Pieterse
>
> "A democracy is nothing more than mob rule, where fifty-one percent of the
> people take away the rights of the other forty-nine." ~ Thomas Jefferson
>



-- 
Dewald Pieterse

"A democracy is nothing more than mob rule, where fifty-one percent of the
people take away the rights of the other forty-nine." ~ Thomas Jefferson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110127/ca5f7c99/attachment.html>


More information about the NumPy-Discussion mailing list