problem with multiprocessing and defaultdict
wiso
gtu2003 at alice.it
Tue Jan 12 05:48:49 EST 2010
Robert Kern wrote:
> On 2010-01-11 17:50 PM, wiso wrote:
>
>> The problem now is this:
>> start reading file r1_200909.log
>> start reading file r1_200910.log
>> readen 488832 lines from file r1_200910.log
>> readen 517247 lines from file r1_200909.log
>>
>> with huge file (the real case) the program freeze. Is there a solution to
>> avoid pickling/serialization, ... for example something like this:
>>
>> if __name__ == "__main__":
>> file_names = ["r1_200909.log", "r1_200910.log"]
>> pool = multiprocessing.Pool(len(file_names))
>> childrens = [Container(f) for f in file_names]
>> pool.map(lambda c: c.read(), childrens)
>>
>> PicklingError: Can't pickle<type 'function'>: attribute lookup
>> __builtin__.function failed
>
> You can't pickle lambda functions.
>
> What information do you actually need back from the workers?
>
They sent back the object filled with data. The problem is very simple: I
have a container, the container has a method read(file_name) that read a
huge file and fill the container with datas. I have more then 1 file to read
so I want to parallelize this process. The reading method is quite slow
because it involves regex.
More information about the Python-list
mailing list