[Numpy-discussion] Multi thread loading data

Chris Colbert sccolbert at gmail.com
Thu Jul 2 11:14:53 EDT 2009


can you hold the entire file in memory as single array with room to spare?
If so, you could use multiprocessing and load a bunch of smaller
arrays, then join them all together.

It wont be super fast, because serializing a numpy array is somewhat
slow when using multiprocessing. That said, its still faster than disk
transfers.

I'm  sure some numpy expert will come on here though and give you a
much better idea.



On Wed, Jul 1, 2009 at 7:57 AM, Mag Gam<magawake at gmail.com> wrote:
> Is it possible to use loadtxt in a mult thread way? Basically, I want
> to process a very large CSV file (100+ million records) and instead of
> loading thousand elements into a buffer process and then load another
> 1 thousand elements and process and so on...
>
> I was wondering if there is a technique where I can use multiple
> processors to do this faster.
>
> TIA
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list