[Tutor] Getting a Process.start() error pickle.PicklingError: Can't pickle <type 'module'>: it's not found as __builtin__.module with Python 2.7
Mats Wichmann
mats at wichmann.us
Tue Sep 3 09:54:51 EDT 2024
On 9/3/24 03:34, marc nicole wrote:
> Hello Alan,
>
> Thanks for the reply, Here's the code I tested for the debug:
>
> import time
> from multiprocessing import Process
>
> def do_Something():
> print('hello world!')
>
> def start(fn):
> p = Process(target=fn, args=())
> p.start()
>
> def ghello():
> print ("hello world g")
>
> def fhello():
> print('hello world f')
>
> if __name__ == "__main__":
> start(do_something)
> print("executed")
> exit(0)
>
> but neither "Hello World" or "Executed" are displayed in the console which
> finishes normally without returning any message.
>
> Module naming is OK and don't think it is a problem related to that.
>
> Now the question, when to use Process/Multiprocess and when to use
> Threading in Python?.Thread is there a distinctive use case that can
> showcase when to use either? are they interchangeable? to note that using
> Threading the console DID display the messages correctly!
Just generically, threading is good when you have discrete tasks that
don't require executing a lot of Python code (the canonical example
seems to be fetching things from webservers over the internet),
everything happens in one process, but a thread can start a request and
the rest of the program can get on with its work ad collect the results
when they're avaiable. This is a class of work called I/O Bound.
Multiprocessing is better when you have discrete tasks but they're
compute intensive. Python (currently) uses a lock around the execution
of microcode, so it doesn't matter how many threads are doing big
computations, they can't run concurrently (they *appear* to run
concurrently, but they're actually taking turns), so here you can
instead start multiple processes, and each can run independently. The
downside is if those processes need to share information, it's quite a
bit more complicated to do so. The final alternative is your problem
isn't easily partitioned, and then you don't use either threading or
multiprocessing :-) threading and multiprocessing are written to use a
very similar API for convenience.
The error you were originally seeing (do you still get the pickle
error?) comes from an oddity. There are different ways to start a new
process, as multiprocessing has to do. On UNIX systems, there's a
fork() system call which makes an exact copy of the running process
(arranging the code to be shared, and the data to be copy-on-write, so
it's fairly efficient). That has long been a little controversial
because usually what happens right after fork() is exec() to run a new
program, so the previous work is just thrown away - but that's ideal for
Python multiprocessing, a copy of the already set up process can just
start running (or *would* be ideal, if it wasn't rather tricky to do
safely). If that model is not available, you have to spawn a new
process and then it has to be set up - this is where Python serializes
the important objects (using pickle) and sends them over to the new
process so initial data setup can be shared. But not everything can be
pickled easily, so multiprocessing has some limitations. Windows
doesn't have the fork strategy and uses spawn method by default, thus
why pickling gets involved. There's more on that here:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
Hope this at least helps understanding, even if it doesn't help your
specific problem.
More information about the Tutor
mailing list