[Tutor] Getting a Process.start() error pickle.PicklingError: Can't pickle <type 'module'>: it's not found as __builtin__.module with Python 2.7

Mats Wichmann mats at wichmann.us
Tue Sep 3 09:54:51 EDT 2024


On 9/3/24 03:34, marc nicole wrote:
> Hello Alan,
> 
> Thanks for the reply, Here's the code I tested for the debug:
> 
> import time
> from multiprocessing import Process
> 
> def do_Something():
>      print('hello world!')
> 
> def start(fn):
>     p = Process(target=fn, args=())
>     p.start()
> 
> def ghello():
>      print ("hello world g")
> 
> def fhello():
>      print('hello world f')
> 
> if __name__ == "__main__":
>      start(do_something)
>      print("executed")
>      exit(0)
> 
> but neither "Hello World" or "Executed" are displayed in the console which
> finishes normally without returning any message.
> 
> Module naming is OK and don't think it is a problem related to that.
> 
> Now the question, when to use Process/Multiprocess and when to use
> Threading in Python?.Thread is there a distinctive use case that can
> showcase when to use either? are they interchangeable? to note that using
> Threading the console DID display the messages correctly!


Just generically, threading is good when you have discrete tasks that 
don't require executing a lot of Python code (the canonical example 
seems to be fetching things from webservers over the internet), 
everything happens in one process, but a thread can start a request and 
the rest of the program can get on with its work ad collect the results 
when they're avaiable. This is a class of work called I/O Bound. 
Multiprocessing is better when you have discrete tasks but they're 
compute intensive.  Python (currently) uses a lock around the execution 
of microcode, so it doesn't matter how many threads are doing big 
computations, they can't run concurrently (they *appear* to run 
concurrently, but they're actually taking turns), so here you can 
instead start multiple processes, and each can run independently.  The 
downside is if those processes need to share information, it's quite a 
bit more complicated to do so.  The final alternative is your problem 
isn't easily partitioned, and then you don't use either threading or 
multiprocessing :-)  threading and multiprocessing are written to use a 
very similar API for convenience.

The error you were originally seeing (do you still get the pickle 
error?) comes from an oddity.  There are different ways to start a new 
process, as multiprocessing has to do.  On UNIX systems, there's a 
fork() system call which makes an exact copy of the running process 
(arranging the code to be shared, and the data to be copy-on-write, so 
it's fairly efficient).  That has long been a little controversial 
because usually what happens right after fork() is exec() to run a new 
program, so the previous work is just thrown away - but that's ideal for 
Python multiprocessing, a copy of the already set up process can just 
start running (or *would* be ideal, if it wasn't rather tricky to do 
safely).  If that model is not available, you have to spawn a new 
process and then it has to be set up - this is where Python serializes 
the important objects (using pickle) and sends them over to the new 
process so initial data setup can be shared.  But not everything can be 
pickled easily, so multiprocessing has some limitations.  Windows 
doesn't have the fork strategy and uses spawn method by default, thus 
why pickling gets involved.  There's more on that here:

https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Hope this at least helps understanding, even if it doesn't help your 
specific problem.



More information about the Tutor mailing list