[Numpy-discussion] speeding up an array operation

Mag Gam magawake at gmail.com
Sat Jul 11 00:47:41 EDT 2009


Hey Frederic:

thanks for the response. I really want it to do it your way but  I am
a bad programmer. Do you have any sample code? your method seems
correct


2009/7/10 Frédéric Bastien <nouiz at nouiz.org>:
> Can you do it by chunk instead of by row? If the chunk is not too big the
> sort could be faster then the access to the multiple dictionnary access. But
> don't forget, you change an algo of O(n), by O(nlogn) with a lower constant.
> So the n should not be too big. Just try different value.
>
> Frédéric Bastien
>
> On Thu, Jul 9, 2009 at 7:14 AM, Mag Gam <magawake at gmail.com> wrote:
>>
>> The problem is the array is very large. We are talking about 200+ million
>> rows.
>>
>>
>> On Thu, Jul 9, 2009 at 4:41 AM, David Warde-Farley<dwf at cs.toronto.edu>
>> wrote:
>> > On 9-Jul-09, at 1:12 AM, Mag Gam wrote:
>> >
>> >> Here is what I have, which does it 1x1:
>> >>
>> >> z={}  #dictionary
>> >> r=csv.reader(file)
>> >> for i,row in enumerate(r):
>> >>  p="/MIT/"+row[1]
>> >>
>> >>  if p not in z:
>> >>    z[p]=0:
>> >>  else:
>> >>    z[p]+=1
>> >>
>> >>  arr[p]['chem'][z[p]]=tuple(row) #this loads the array 1 x 1
>> >>
>> >>
>> >> I would like to avoid the 1x1 loading, instead I would like to bulk
>> >> load the array. Lets say load up 5million lines into memory and then
>> >> push into array. Any ideas on how to do that?
>> >
>> >
>> > Depending on how big your data is, this looks like a job for e.g.
>> > numpy.loadtxt(), to give you one big array.
>> >
>> > Then sort the array on the second column, so that all the rows with
>> > the same 'p' appear one after the other. Then you can assign slices of
>> > this big array to be arr[p]['chem'].
>> >
>> > David
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



More information about the NumPy-Discussion mailing list