clifford.wells at comcast.net
Wed Oct 13 15:43:59 CEST 2004
On Wed, 2004-10-13 at 08:52 -0400, Peter L Hansen wrote:
> Cliff Wells wrote:
> > On Wed, 2004-10-13 at 14:11 +0200, Diez B. Roggisch wrote:
> >>"brutally" serializes (hopefully) all accesses to python data-structures -
> > Nope. It doesn't do this. For access to items such as integers you are
> > probably fine, but for things like lists, dictionaries, class
> > attributes, etc, you're on your own. The GIL only ensures that two
> > threads won't be executing Python bytecode simultaneously. It locks the
> > Python *interpreter*, not your program or data structures.
> >>so e.g. running several threads, appending to the same list, won't result
> >>in messing up the internal list structure causing segfaults or the like.
> > True, you won't get segfaults. However, you may very well get a
> > traceback or mangled data.
> >>That makes programming pretty easy, at the cost of lots of waiting for the
> >>individual threads.
> > Threading in Python is pretty easy, but certainly not *that* easy.
> Cliff, do you have any references, or even personal experience to
> relate about anything on which you comment above?
I'm no expert on Python internals but it seems clear that an operation
such as .append() is going to span multiple bytecode instructions. It
seems to me that if those instructions span the boundary defined by
sys.getcheckinterval() that the operation won't happen in a single
thread context switch (unless the interpreter has explicit code to keep
the entire operation within a single context).
I'm no expert at dis nor Python bytecode, but I'll give it a shot :)
>>> l = 
134 0 LOAD_GLOBAL 0 (findlabels)
3 LOAD_FAST 0 (code)
6 CALL_FUNCTION 1
9 STORE_FAST 5 (labels)
<snip dis spitting out over 500 lines of bytecode>
172 >> 503 PRINT_NEWLINE
504 JUMP_ABSOLUTE 33
>> 507 POP_TOP
>> 509 LOAD_CONST 0 (None)
It looks fairly non-atomic to me. It's certainly smaller than the
default value for sys.getcheckinterval() (which defaults to 1000, iirc),
but that's hardly a guarantee that the operation won't cross the
boundary for a context switch (unless, as I mentioned above, the
interpreter has specific code to prevent the switch until the operation
is complete <shrug>).
I recall a similar discussion about three years ago on this list about
this very thing where people who know far more about it than I do flamed
it out a bit, but damned if I recall the outcome :P I do recall that it
didn't convince me to alter the approach I recommended to the OP.
> In my experience, and to my knowledge, Python threading *is*
> that easy (ignoring higher level issues such as race conditions
> and deadlocks and such), and the GIL *does* do exactly what Diez
> suggests, and you will *not* get tracebacks nor (again, ignoring
> higher level issues) mangled data.
Okay, to clarify, for the most part I *was* in fact referring to "higher
level issues". I doubt tracebacks or mangled data would occur simply
due to the operation's being non-atomic. However, if you have code that
say, checks for an item's existence in a list and then appends it if it
isn't there, it may cause the program to fail if another thread adds
that item between the time of the check and the time of the append.
This is what I was referring to by potential for mangled data and/or
> You've tentatively upset my entire picture of the CPython (note,
> CPython only) interpreter's structure and concept. Please tell
> me you were going a little overboard to protect a possible
> newbie from himself or something.
Certainly protecting the newbie, but not going overboard, IMHO. I've
written quite a number of threaded Python apps and I religiously
acquire/release whenever dealing with mutable data structures (lists,
etc). To date this approach has served me well. I code fairly
conservatively when it comes to threads as I am *absolutely* certain
that debugging a broken threaded application is very near the bottom of
my list of favorite things ;)
Cliff Wells <clifford.wells at comcast.net>
More information about the Python-list