problem with multiprocessing and defaultdict

Mon Jan 11 19:15:47 EST 2010

On 2010-01-11 17:50 PM, wiso wrote:

> The problem now is this:
> start reading file r1_200909.log
> start reading file r1_200910.log
> readen 488832 lines from file r1_200910.log
> readen 517247 lines from file r1_200909.log
>
> with huge file (the real case) the program freeze. Is there a solution to
> avoid pickling/serialization, ... for example something like this:
>
> if __name__ == "__main__":
>      file_names = ["r1_200909.log", "r1_200910.log"]
>      pool = multiprocessing.Pool(len(file_names))
>      childrens = [Container(f) for f in file_names]
>      pool.map(lambda c: c.read(), childrens)
>
> PicklingError: Can't pickle<type 'function'>: attribute lookup
> __builtin__.function failed

You can't pickle lambda functions.

What information do you actually need back from the workers?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco