[IronPython] Ironclad problem, with which someone here may be able to help

Dino Viehland dinov at microsoft.com
Thu Nov 6 06:30:15 CET 2008


I would suggest getting a snap shot of the call stacks when this is happening if that's possible.  I can't pin anything down but I wonder if you could have an STA object or something that otherwise requires message pumping.  That message pumping could happen while you're doing a Monitor.Enter call.  If that was happening maybe there is some subtle CLR bug or a surprise feature?  It is surprising that Monitor.Enter can go re-entrant but it can...

So it'd be interesting to get thread snapshots and see if

>                EnsureGIL (443) 2
>                EnsureGIL (443) 1      <- omg, wtf, bbq, etc.

Could be happening because Thread 1 experiences contention, blocks and pumps messages, causing the finalizer thread (2) to run, the lock is acquired and ... ?

Only other thing I could think of is does it repro on other machines?  Maybe it's a hardware bug as unlikely as that seems?

-----Original Message-----
From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of William Reade
Sent: Wednesday, November 05, 2008 10:01 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Ironclad problem, with which someone here may be able to help

The log starts in the middle (after many lock/unlocks, some from each
thread); I'm running on x86; and I have no additional AppDomains.

I don't think it would be safe for me to entirely avoid locking during
finalization, but I could probably cut it down to a quick lock, on a
separate object, to enqueue pointers for cleanup and deallocation on the
main thread. However, I'm reluctant to do that until I'm sure that the
problem is specifically related to GC, otherwise it'll just come back as
soon as anyone tries any serious multithreading :).

Curt Hagenlocher wrote:
> Locking during finalization is often considered to be a bad idea.  In
> particular, locking without a timeout introduces the possibility that
> you will hang the finalization thread, preventing further objects from
> being finalized.  But clearly, that's not what's happening here.
>
> Other questions that probably don't matter but might be interesting to
> know:
>
> Can we assume that the finalization thread isn't the first place where
> this lock is required?  That your log starts somewhere in the middle?
>
> Is this under x86 or x64 or both?
>
> Are you creating any additional AppDomains in the process?
>
>
> On Wed, Nov 5, 2008 at 10:15 AM, William Reade
> <william at resolversystems.com <mailto:william at resolversystems.com>> wrote:
>
>     Hi Curt
>
>     I am indeed; that's how I know thread 2 is the GC thread. Is
>     locking during GC forbidden?
>
>     William
>
>     Curt Hagenlocher wrote:
>
>         ...or, for that matter, any __del__ methods from within Python
>         -- which ultimately are handled by finalization.
>
>         On Wed, Nov 5, 2008 at 9:37 AM, Curt Hagenlocher
>         <curt at hagenlocher.org <mailto:curt at hagenlocher.org>
>         <mailto:curt at hagenlocher.org <mailto:curt at hagenlocher.org>>>
>         wrote:
>
>            So, the obvious question for me is whether or not you're
>         using any
>            finalizers.
>
>
>            On Wed, Nov 5, 2008 at 5:57 AM, William Reade
>            <william at resolversystems.com
>         <mailto:william at resolversystems.com>
>         <mailto:william at resolversystems.com
>         <mailto:william at resolversystems.com>>>
>
>            wrote:
>
>                Hi all
>
>                While running the numpy tests, I've come across a situation
>                which, to the best of my knowledge, is simply
>         impossible. I'm
>                hoping that one of the local .NET gurus will be able to
>         tell
>                me what I'm missing, or point me somewhere I can get
>         more insight.
>
>                The 4 methods involved are as follows:
>                -----------------------
>                      public int GetThreadId()
>                      {
>                          return Thread.CurrentThread.ManagedThreadId;
>                      }
>
>                      public void WriteFlush(string info)
>                      {
>                          Console.WriteLine(info);
>                          Console.Out.Flush();
>                      }
>
>                      public void EnsureGIL()
>                      {
>                          Monitor.Enter(this.dispatcherLock);
>                          this.WriteFlush(String.Format(
>                              "EnsureGIL ({1}) {0}", this.GetThreadId(),
>                Builtin.id(this.dispatcherLock)));
>                      }
>
>                      public void ReleaseGIL()
>                      {
>                          this.WriteFlush(String.Format(
>                              "ReleaseGIL ({1}) {0}\n", this.GetThreadId(),
>                Builtin.id(this.dispatcherLock)));
>                          Monitor.Exit(this.dispatcherLock);
>                      }
>                -----------------------
>                ...and they can, and do, occasionally produce output as
>         follows:
>                -----------------------
>                EnsureGIL (443) 2
>                EnsureGIL (443) 1      <- omg, wtf, bbq, etc.
>                ReleaseGIL (443) 2
>
>                EnsureGIL (443) 2
>                ReleaseGIL (443) 1
>
>                ReleaseGIL (443) 2
>                -----------------------
>                When this happens, the process continues happily for a
>         short
>                time and then falls over in a later call to ReleaseGIL
>         (after
>                successfully calling it several times). The error is "
>         Object
>                synchronization method was called from an
>         unsynchronized block
>                of code", which I understand to mean "you can't release
>         this
>                lock because you don't hold it".
>
>                It doesn't happen very often, but I can usually
>         reproduce it
>                by running
>         test_multiarray.TestFromToFile.test_malformed a few
>                hundred times. It may be relevant to note that thread 2
>         is the
>                GC thread, and thread 1 is the main thread. I have
>         considered
>                the following possibilities:
>
>                (1) That I'm locking on the wrong object. I believe
>         that isn't
>                the case, because it's constructed only once, as a "new
>                Object()" (ie, a reference type), and is only subsequently
>                used for locking; and, because it keeps the same ipy id
>                throughout.
>
>                (2) That Monitor.Enter occasionally allows two different
>                threads to acquire the same lock. I consider this extremely
>                unlikely, because... well, how many multithreaded .NET apps
>                already exist? If Monitor really were broken, I think we'd
>                probably know about it by now.
>
>                (3) That calling Flush() on a SyncTextWriter (the type of
>                Console.Out) doesn't actually do anything, and the
>         output is
>                somehow wrongly ordered (although I can't imagine how this
>                could actually be: if the locking is really working,
>         then my
>                console writes are strictly sequential). I don't have
>         access
>                to the code, so I have no idea how it's implemented,
>         but even
>                if this is the case it doesn't help much with the
>         fundamental
>                problem (the synchronisation error which follows).
>
>                Apart from the above, I'm out of ideas. Can anyone suggest
>                what I've missed?
>
>                William
>                _______________________________________________
>                Users mailing list
>                Users at lists.ironpython.com
>         <mailto:Users at lists.ironpython.com>
>         <mailto:Users at lists.ironpython.com
>         <mailto:Users at lists.ironpython.com>>
>
>
>          http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
>
>
>         ------------------------------------------------------------------------
>
>
>
>         _______________________________________________
>         Users mailing list
>         Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>         http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
>
>
>     _______________________________________________
>     Users mailing list
>     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>

_______________________________________________
Users mailing list
Users at lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com




More information about the Ironpython-users mailing list