Multiple scripts versus single multi-threaded script

Thu Oct 3 18:22:36 EDT 2013

On Fri, Oct 4, 2013 at 5:53 AM, Roy Smith <roy at panix.com> wrote:
> So, I think my original statement:
>
>> if you're looking for a short answer, I'd say just keep doing what
>> you're doing using multiple processes and don't get into threading.
>
> is still good advice for somebody who isn't sure they need threads.
>
> On the other hand, for somebody who is interested in learning about
> threads, Python is a great platform to learn because you get to
> experiment with the basic high-level concepts without getting bogged
> down in pthreads minutiae.  And, as Chris pointed out, if you get it
> wrong, at least you've still got valid Python objects to puzzle over,
> not a smoking pile of bits on the floor.

Agree wholeheartedly to both halves. I was just explaining a similar
concept to my brother last night, with regard to network/database
request handling:

1) The simplest code starts, executes, and finishes, with no threads,
fork(), or other confusions.or shared state or anything. Execution can
be completely predicted by eyeballing the source code. You can pretend
that you have a dedicated CPU core that does nothing but run your
program.

2) Threaded code adds a measure of complexity that you have to get
your head around. Now you need to concern yourself with preemption,
multiple threads doing things in different orders, locking, shared
state, etc, etc. But you can still pretend that the execution of one
job will happen as a single "thing", top down, with predictable
intermediate state, if you like. (Python's threading and multiprocess
modules both follow this style, they just have different levels of
shared state.)

3) Asynchronous code adds significantly more "get your head around"
complexity, since you now have to retain state for multiple
jobs/requests in the same thread. You can't use local variables to
keep track of where you're up to. Most likely, your code will do some
tiny thing, update the state object for that request, fire off an
asynchronous request of your own (maybe to the hard disk, with a
callback when the data's read/written), and then return, back to some
main loop.

Now imagine you have a database written in style #1, and you have to
drag it, kicking and screaming, into the 21st century. Oh look, it's
easy! All you have to do is start multiple threads doing the same job!
And then you'll have some problems with simultaneous edits, so you put
some big fat locks all over the place to prevent two threads from
doing the same thing at the same time. Even if one of those threads
was handling something interactive and might hold its lock for some
number of minutes. Suboptimal design, maybe, but hey, it works right?
That's what my brother has to deal with every day, as a user of said
database... :|

ChrisA