[Numpy-discussion] Multi thread loading data

Chris Colbert sccolbert at gmail.com
Thu Jul 2 13:08:53 EDT 2009


I'm relatively certain its possible, but then you have to deal with
locks, semaphores, synchronization, etc...


On Thu, Jul 2, 2009 at 12:04 PM, Sebastian Haase<seb.haase at gmail.com> wrote:
> On Thu, Jul 2, 2009 at 5:38 PM, Chris Colbert<sccolbert at gmail.com> wrote:
>> Who are quoting Sebastian?
>>
>> Multiprocessing is a python package that spawns multiple python
>> processes, effectively side-stepping the GIL, and provides easy
>> mechanisms for IPC. Hence the need for serialization....
>>
> I was replying to the OP's email
>
> Regarding your comment: can separate processes not access the same
> memory space !? via shared memory ...
> I think there was a discussion about this not to long ago on this list.
>
> -S.
>
>
>
>> On Thu, Jul 2, 2009 at 11:30 AM, Sebastian Haase<seb.haase at gmail.com> wrote:
>>> On Thu, Jul 2, 2009 at 5:14 PM, Chris Colbert<sccolbert at gmail.com> wrote:
>>>> can you hold the entire file in memory as single array with room to spare?
>>>> If so, you could use multiprocessing and load a bunch of smaller
>>>> arrays, then join them all together.
>>>>
>>>> It wont be super fast, because serializing a numpy array is somewhat
>>>> slow when using multiprocessing. That said, its still faster than disk
>>>> transfers.
>>>>
>>>> I'm  sure some numpy expert will come on here though and give you a
>>>> much better idea.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 1, 2009 at 7:57 AM, Mag Gam<magawake at gmail.com> wrote:
>>>>> Is it possible to use loadtxt in a mult thread way? Basically, I want
>>>>> to process a very large CSV file (100+ million records) and instead of
>>>>> loading thousand elements into a buffer process and then load another
>>>>> 1 thousand elements and process and so on...
>>>>>
>>>>> I was wondering if there is a technique where I can use multiple
>>>>> processors to do this faster.
>>>>>
>>>>> TIA
>>>
>>> Do you know about the GIL (global interpreter lock) in Python ?
>>> It  means that Python isn't doing "real" multithreading...
>>> Only if one thread is e.g. doing some slow or blocking io stuff, the
>>> other thread could keep work, e.g. doing CPU-heavy numpy stuff.
>>> But you would get 2-CPU numpy code - except for some C-implemented
>>> "long running" operations -- these should be programmed in a way that
>>> releases the GIL so that the other CPU could go on doing it's Python
>>> code.
>>>
>>> HTH,
>>> Sebastian Haase
>>> _______________________________________________
>>> Numpy-discussion mailing list
>>> Numpy-discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list