[Python-bugs-list] [ python-Bugs-766669 ] Consistent GPF on exit

SourceForge.net noreply@sourceforge.net
Sun, 06 Jul 2003 21:21:14 -0700


Bugs item #766669, was opened at 2003-07-06 08:32
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=766669&group_id=5470

Category: Windows
Group: Python 2.2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Kurt Grittner (grittkmg)
Assigned to: Nobody/Anonymous (nobody)
Summary: Consistent GPF on exit

Initial Comment:
Using the following script (clt.py):

import os, time, sys
from socket import *              
from Tkinter import *

class App:
    def __init__(self, master):
        self.win = Toplevel(master)
        self.button = Button(self.win, text="QUIT", 
fg="red", command=self.goaway)
        self.button.pack(side=LEFT)
        self.hi_there = Button(self.win, text="Hello", 
command=self.say_hi)
        self.hi_there.pack(side=LEFT)
        #self.serverHost = 'localhost'      
        self.serverHost = '192.168.1.12'      
        self.serverPort = 50007            
        self.sockobj = socket(AF_INET, SOCK_STREAM)      
        self.sockobj.connect((self.serverHost, self.
serverPort))   

    def say_hi(self):
        self.message = ['Hello network world']   
        for line in self.message:
            self.sockobj.send(line)  
        data = self.sockobj.recv(1024)
        print 'Client received:', `data`

    def goaway(self):
        self.sockobj.shutdown(0)
        self.sockobj.close()
        self.win.destroy()

def NewClient():
	App(root)

def PgmExit():
	root.quit()
    
root = Tk()
Button(root, text='New Client', command=NewClient).
pack()
Button(root, text='Quit All Clients', command=PgmExit).
pack()
root.mainloop()

If you press 'Quit' first, then there are no errors.

If you press 'New Client' even once (whether or not there 
is an echo server to talk to) the program runs the other 
Toplevel windows seemingly without problems, but then 
when you press 'Quit' you die in the following place :

/* Vc98\Crt\Src\Crtexe.c */

#ifdef WPRFLAG
            __winitenv = envp;
            mainret = wmain(argc, argv, envp);
#else  /* WPRFLAG */
            __initenv = envp;
            mainret = main(argc, argv, envp);
#endif  /* WPRFLAG */

#endif  /* _WINMAIN_ */

/* 
-------------------------------------------------------
----------------- */
/* Error executing following line */
            exit(mainret);
/* 
-------------------------------------------------------
----------------- */

/* OS error dump
PYTHON_D caused an invalid page fault in
module KERNEL32.DLL at 017f:bff88396.
Registers:
EAX=c00309c4 CS=017f EIP=bff88396 EFLGS=00210216
EBX=0062ffec SS=0187 ESP=0052fecc EBP=00530044
ECX=00000000 DS=0187 ESI=00000000 FS=10e7
EDX=bff76855 ES=0187 EDI=bff79060 GS=0000
Bytes at CS:EIP:
53 56 57 8b 75 10 8b 38 33 db 85 f6 75 2d 8d b5 
Stack dump:

/* Vc98\Crt\Src\Crt0dat.c */
/* First debug run */
1020ACE0   jmp         doexit+0B6h (1020acf6)
384:          }
385:
386:
387:          _C_Exit_Done = TRUE;
1020ACE2   mov         dword ptr [__C_Exit_Done 
(10264728)],1
388:
389:          ExitProcess(code);
1020ACEC   mov         ecx,dword ptr [code]
1020ACEF   push        ecx

/* 
-------------------------------------------------------
----------------- */
/* Error on the following call */
1020ACF0   call        dword ptr [__imp__ExitProcess@4 
(10256020)]
/* 
-------------------------------------------------------
----------------- */

/* Second debug run */
PYTHON_D caused an invalid page fault in
module KERNEL32.DLL at 017f:bff88396.
Registers:
EAX=c00309c4 CS=017f EIP=bff88396 EFLGS=00210216
EBX=0062ffec SS=0187 ESP=0052fecc EBP=00530044
ECX=00000000 DS=0187 ESI=00000000 FS=1b3f
EDX=bff76855 ES=0187 EDI=bff79060 GS=0000
Bytes at CS:EIP:
53 56 57 8b 75 10 8b 38 33 db 85 f6 75 2d 8d b5 
Stack dump:


390:
391:
392:  }
1020ACF6   mov         esp,ebp
1020ACF8   pop         ebp
1020ACF9   ret

/* Vc98\Crt\Src\Crtexe.c - Continued */

*/

        }
        __except ( _XcptFilter(GetExceptionCode(), 
GetExceptionInformation()) )
        {
            /*
             * Should never reach here
             */
            _exit( GetExceptionCode() );

        } /* end of try - except */

}

It seems weird that everything else in the program works 
fine right up to ExitProcess.  The error occurs on multiple 
windows machines.  The error does NOT occur on linux 
running the same code base.  Also, many other 
multi-threaded sample programs from other authors 
exhibit the same 'crash on exit' syndrome.

Thanks for your attention,
Kurt


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2003-07-07 00:21

Message:
Logged In: YES 
user_id=31435

You're trying too hard <wink>.  I'm not arguing about the 
design (for one thing, it's not my design, and I don't know 
anything about it), I'm wondering both why I can't reproduce 
a problem, and especially why no other report of this has ever 
been made.  You surely don't believe you're the only 
programmer to try sockets in Python on Windows, right?

I also tried it under an up-to-date Win98SE, also with MSVC 6 
and all current IE 6 patches.  Indeed, that's the machine I've 
built the PythonLabs Windows installer *on* for the last two 
years.  No problem.  Everyone who has ever run the Python 
test suite on Windows has exercised sockets on Windows, 
and so has anyone who has ever run Zope on Windows, etc.  
It may indeed be a bit of Rocket Science if you don't have a 
theory for why it fails only when you run it!

The reason I'm keen to know is that only someone who can 
see the problem can test a putative fix (and so confirm-- or 
refute --that the problem goes away).

----------------------------------------------------------------------

Comment By: Kurt Grittner (grittkmg)
Date: 2003-07-07 00:00

Message:
Logged In: YES 
user_id=816888

Umm, this is not rocket science.  WSACleanup() is being called 
in the wrong context.  Startup is called in a DLL context. That 
puts the info in Thread Local Storage (TLS).  The DLL Is 
allowed to unload with an atexit() pointer still pending to it.  
This is bad design.   You should always call startup/cleanup in 
pairs and in the same thread context.  As I said before this 
happens on every win98SE machine that I have tried it on.  I 
have no firewall or virus scanner.  This is a fairly recent 
installation of win98 with all of the MS updates for I.E.  Apart 
from that, the most unusual thing is the presence of the 
VisualStudio 6.0 suite of programs.

I can do a new install of win98SE on an old machine, just out 
of curiosity, but you only have to read this web site to see 
that this design for startup/cleanup calls is a recipie for 
disaster:

http://support.microsoft.com/default.aspx?scid=http:
//support.microsoft.com:80/support/kb/articles/Q237/5/72.
ASP&NoWebContent=1

Here is a list of some of the modules running on my machine:

USER32.DLL	4.10.2227	Microsoft Corporation	Win32 
USER32 core component	C:\WINDOWS\SYSTEM\USER32.
DLL	4/21/00 3:33:18 PM GMT	Microsoft(R) Windows(R) 
Operating System	bfc00000 - bfc11000
GDI32.DLL	4.10.1998	Microsoft Corporation	Win32 
GDI core component	C:\WINDOWS\SYSTEM\GDI32.
DLL	4/23/98 4:24:18 AM GMT	Microsoft(R) Windows(R) 
Operating System	bff20000 - bff46000
ADVAPI32.DLL	4.80.1675	Microsoft 
Corporation	Win32 ADVAPI32 core component	C:
\WINDOWS\SYSTEM\ADVAPI32.DLL	4/23/99 4:37:33 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	bfe80000 - bfe90000
KERNEL32.DLL	4.10.2222	Microsoft 
Corporation	Win32 Kernel core component	C:
\WINDOWS\SYSTEM\KERNEL32.DLL	4/23/99 12:45:39 AM 
GMT	Microsoft(R) Windows(R) Operating 
System	bff70000 - bffe3000
MPR.DLL	4.10.1998	Microsoft Corporation	WIN32 
Network Interface DLL	C:\WINDOWS\SYSTEM\MPR.
DLL	4/23/99 4:37:36 PM GMT	Microsoft(R) Windows(R) 
Operating System	7fbf0000 - 7fbfe000
MSNP32.DLL	4.10.2222	Microsoft 
Corporation	Network provider for Microsoft 
networks	C:\WINDOWS\SYSTEM\MSNP32.DLL	4/23/99 
4:37:37 PM GMT	Microsoft(R) Windows(R) Operating 
System	7fb20000 - 7fb34000
MSNET32.DLL	4.10.2224	Microsoft 
Corporation	Microsoft 32-bit Network API Library	C:
\WINDOWS\SYSTEM\MSNET32.DLL	11/12/99 4:15:15 AM 
GMT	Microsoft(R) Windows(R) Operating 
System	7f300000 - 7f313000
MPREXE.EXE	4.10.1998	Microsoft 
Corporation	WIN32 Network Interface Service 
Process	C:\WINDOWS\SYSTEM\MPREXE.EXE	4/23/98 
2:19:38 AM GMT	Microsoft(R) Windows(R) Operating 
System	00500000 - 00507000
MPRSERV.DLL	4.10.2222	Microsoft 
Corporation	Multinet Router	C:
\WINDOWS\SYSTEM\MPRSERV.DLL	4/23/99 4:37:37 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	7fad0000 - 7faf6000
NETAPI32.DLL	4.10.1998	Microsoft 
Corporation	32-bit network API DLL	C:
\WINDOWS\SYSTEM\NETAPI32.DLL	4/23/99 4:37:37 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	7f990000 - 7f995000
NETBIOS.DLL				C:\WINDOWS\SYSTEM\NETBIOS.
DLL	4/23/99 4:37:39 PM GMT		7f840000 - 7f848000
RNR20.DLL	4.10.2222	Microsoft 
Corporation	Windows Socket2 NameSpace DLL	C:
\WINDOWS\SYSTEM\RNR20.DLL	4/23/99 4:38:44 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	783c0000 - 783cf000
MSAFD.DLL	4.10.1998	Microsoft 
Corporation	Microsoft Windows Sockets 2.0 Service 
Provider	C:\WINDOWS\SYSTEM\MSAFD.DLL	4/23/99 4:
38:11 PM GMT	Microsoft(R) Windows(R) Operating 
System	7b410000 - 7b41b000

RPCLTSCM.DLL	4.71.2900	Microsoft 
Corporation	Remote Process Control LTSCM DLL	C:
\WINDOWS\SYSTEM\RPCLTSCM.DLL	4/23/99 4:38:45 PM 
GMT	Microsoft® Windows® Operating 
System	78320000 - 7832a000
WSOCK32.DLL	4.10.1998	Microsoft 
Corporation	BSD Socket API for Windows	C:
\WINDOWS\SYSTEM\WSOCK32.DLL	4/23/99 4:39:14 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	75fa0000 - 75faa000
MSWSOCK.DLL	4.10.2222	Microsoft 
Corporation	Microsoft WinSock Extension APIs	C:
\WINDOWS\SYSTEM\MSWSOCK.DLL	4/23/99 4:38:30 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	794d0000 - 794e5000
WS2_32.DLL	4.10.2222	Microsoft 
Corporation	Windows Socket 2.0 32-Bit DLL	C:
\WINDOWS\SYSTEM\WS2_32.DLL	4/23/99 4:39:13 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	76000000 - 76012000
WS2HELP.DLL	4.10.1998	Microsoft 
Corporation	Windows Socket 2.0 Helper for Windows 
98	C:\WINDOWS\SYSTEM\WS2HELP.DLL	4/23/99 4:
39:13 PM GMT	Microsoft(R) Windows(R) Operating 
System	75fe0000 - 75fe6000
DIGEST.DLL	6.00.2800.1106	Microsoft 
Corporation	Digest SSPI Authentication Package	C:
\WINDOWS\SYSTEM\DIGEST.DLL	8/29/02 11:21:08 AM 
GMT	Microsoft® Windows® Operating 
System	007a0000 - 007b0000
MSNSSPC.DLL	6.00.7753	Microsoft 
Corporation	MSN Client for 32 bit platforms	C:
\WINDOWS\SYSTEM\MSNSSPC.DLL	4/23/99 4:38:21 PM 
GMT	Microsoft® Internet Services	7a210000 - 
7a22f000
MSAPSSPC.DLL	5.00.7729	Microsoft 
Corporation	DPA Client for 32 bit platforms	C:
\WINDOWS\SYSTEM\MSAPSSPC.DLL	4/23/99 4:38:11 PM 
GMT	Microsoft® Internet Services	7b3f0000 - 
7b401000
MSVCRT40.DLL	4.22.0000	Microsoft 
Corporation	Microsoft® C Runtime Library	C:
\WINDOWS\SYSTEM\MSVCRT40.DLL	4/23/99 4:38:29 PM 
GMT	Microsoft® Visual C++	79660000 - 796b5000
SECUR32.DLL	4.10.2222	Microsoft 
Corporation	Microsoft Win32 Security Services	C:
\WINDOWS\SYSTEM\SECUR32.DLL	4/23/99 4:37:39 PM 
GMT	Microsoft(R) Windows(R) Operating 
System	7f870000 - 7f87a000
RPCSS.EXE	4.71.2900	Microsoft 
Corporation	Distributed COM Services	C:
\WINDOWS\SYSTEM\RPCSS.EXE	9/1/98 6:18:39 PM 
GMT	Microsoft(R) Windows NT(TM) Operating 
System	01000000 - 01005000
MSVCRT20.DLL	2.11.000	Microsoft 
Corporation	Microsoft® C Runtime Library	C:
\WINDOWS\SYSTEM\MSVCRT20.DLL	4/23/99 4:37:36 PM 
GMT	Microsoft® Visual C++	7fc30000 - 7fc75000


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-07-06 23:13

Message:
Logged In: YES 
user_id=31435

Kurt, which versions of Windows and winsock.dll are you 
using?  I continue to be unable to reproduce any problem 
here, and there have been no other reports of this despite 
thousands of Python programs using sockets on Windows for 
a decade.  I have to conclude there's something unique in 
your setup.  FYI, sometimes virus scanners and/or "personal 
firewalls" cause bizarre crashes during startup or shutdown.  
Are you running one (or more) of those?

----------------------------------------------------------------------

Comment By: Kurt Grittner (grittkmg)
Date: 2003-07-06 21:29

Message:
Logged In: YES 
user_id=816888

I have a working fix for this problem, but the code is a fairly 
ugly hack. (A terrible thing to do to such a well organized 
codebase).  

1. I exported a fini_socket() function from the PYD DLL.
2. I cloned the dynamic lookup function to check for the 
optional fini_socket() name as well as the init_socket().  If 
found, I remember it in a global.
3. In PyImport_Cleanup() I call the fini_socket() routine (if 
present) before these modules start getting freed.
4. In NTInit() I add to a counter instead of using atexit()
5. fini_socket() calls NTcleanup(), which simply calls 
WSACleanup 'counter' times to release the winsock DLL.

This works, so now I can go on with my Python study.  If 
anyone wants my hack code, just email me.

Kurt

----------------------------------------------------------------------

Comment By: Kurt Grittner (grittkmg)
Date: 2003-07-06 16:57

Message:
Logged In: YES 
user_id=816888

The minimum required to cause this GPF is the following:
C:\ptest>python
Python 2.3b2 (#43, Jun 29 2003, 16:43:04) [MSC v.1200 32 
bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more 
information.
>>> from socket import *
>>> s=socket(AF_INET, SOCK_STREAM)
>>><Ctrl-Z>

I have found the problem in the source, though I don't know 
how to fix it.  WSAStartup() is called as a DLL export, but the 
WSACleanup() is called as an atexit().  This is too late to call 
it.  It must be called before the DLL unloads, or the 
WSAStartup should be moved to the process level.  I don't 
know the program well enough to attempt a hack, but I did 
verify that it is the WSACleanup line running from the CRT 
code that is killing things (every time you even create a 
socket).

Notice, this is using the latest version too.

Kurt



----------------------------------------------------------------------

Comment By: Kurt Grittner (grittkmg)
Date: 2003-07-06 14:41

Message:
Logged In: YES 
user_id=816888

Hi Tim,

Here is some more info.  I stripped out all references to 
socket, and saved as clt2.clw to force it to load with 
pythonw.exe.  Things worked perfectly.  Here is the script:

import os, time, sys
#from socket import *              
from Tkinter import *

class App:
    def __init__(self, master):
        self.win = Toplevel(master)
        self.button = Button(self.win, text="QUIT", fg="red", 
command=self.goaway)
        self.button.pack(side=LEFT)
        self.hi_there = Button(self.win, text="Hello", 
command=self.say_hi)
        self.hi_there.pack(side=LEFT)
        #self.serverHost = 'localhost'      
        self.serverHost = '192.168.1.12'      
        self.serverPort = 50007            
        #self.sockobj = socket(AF_INET, SOCK_STREAM)      
        #self.sockobj.connect((self.serverHost, self.serverPort))  
 

    def say_hi(self):
        self.message = ['Hello network world']   
        #for line in self.message:
        #    self.sockobj.send(line)  
        #data = self.sockobj.recv(1024)
        print 'Client received:', `data`

    def goaway(self):
        #self.sockobj.shutdown(0)
        #self.sockobj.close()
        self.win.destroy()

def NewClient():
	App(root)

def PgmExit():
	root.quit()
    
root = Tk()
Button(root, text='New Client', command=NewClient).pack()
Button(root, text='Quit All Clients', command=PgmExit).pack()
root.mainloop()

So next, I did the reverse... I took out all references to Tkinter 
and saved it as clt3.clw (again to use pythonw.exe).  It blew 
sky high again.  Here's the script:

import os, time, sys
from socket import *              

class App:
    def __init__(self, master):
        self.serverHost = '192.168.1.12'      
        self.serverPort = 50007            
        self.sockobj = socket(AF_INET, SOCK_STREAM)      
        self.sockobj.connect((self.serverHost, self.serverPort))   

    def say_hi(self):
        self.message = ['Hello network world']   
        for line in self.message:
            self.sockobj.send(line)  
        data = self.sockobj.recv(1024)
        print 'Client received:', `data`

    def goaway(self):
        self.sockobj.shutdown(0)
        self.sockobj.close()

x = App(0)
x.say_hi()
time.sleep(1)
y = App(1)
y.say_hi()
time.sleep(1)
x.goaway()
y.goaway()
time.sleep(1)

Here's the output from the linux-based echo server:

ns1:/www/python # python select-server.py
select-server loop starting
Connect: ('192.168.1.20', 2382) 135755344
        got Hello network world on 135755344
Connect: ('192.168.1.20', 2383) 135744224
        got Hello network world on 135744224
        got  on 135755344
        got  on 135744224

Here's the (now familiar) GPF output:

PYTHONW caused an invalid page fault in
module KERNEL32.DLL at 017f:bff7b9a6.
Registers:
EAX=00000000 CS=017f EIP=bff7b9a6 EFLGS=00000246
EBX=007961a0 SS=0187 ESP=0095fba8 EBP=0095fbe8
ECX=0095fbf8 DS=0187 ESI=10013168 FS=19d7
EDX=0079d670 ES=0187 EDI=0079e7b0 GS=356e
Bytes at CS:EIP:
ff 76 04 e8 13 89 ff ff 5e c2 04 00 56 8b 74 24 
Stack dump:
007961a0 10007eb0 10013168 7800b317 0079eb50 0079d670 
00000000 007a2c98 00000000 81dafb90 00794641 760028e8 
0079d67c 0079d670 007961a0 0079d67c 

And here's the netstat output from right after the crash:

C:\WINDOWS>netstat /a

Active Connections

  Proto  Local Address          Foreign Address        State
  TCP    amd2000:2339           AMD2000:0              
LISTENING
  TCP    amd2000:135            AMD2000:0              LISTENING
  TCP    amd2000:1243           AMD2000:0              
LISTENING
  TCP    amd2000:1025           AMD2000:0              
LISTENING
  TCP    amd2000:1699           AMD2000:0              
LISTENING
  TCP    amd2000:2339           www.mlgames.org:22     
ESTABLISHED
  TCP    amd2000:2382           www.mlgames.org:50007  
TIME_WAIT
  TCP    amd2000:2383           www.mlgames.org:50007  
TIME_WAIT
  TCP    amd2000:137            AMD2000:0              LISTENING
  TCP    amd2000:138            AMD2000:0              LISTENING
  TCP    amd2000:nbsession      AMD2000:0              
LISTENING
  TCP    amd2000:1243           www.mlgames.org:22     
ESTABLISHED
  UDP    amd2000:1699           *:*
  UDP    amd2000:nbname         *:*
  UDP    amd2000:nbdatagram     *:*

C:\WINDOWS>


Notice the TIME_WAIT connections to port 50007.

It looks like this is a 'socket' library problem.  I also tried 
adding del x del y and a long delay to the end of the script to 
see if things would change... but no.  Same explosion.

So, to answer your question.  It happens on both python and 
pythonw, and it happens without loading Tkinter.  I suppose I 
can try the latest development version as you suggested.

Kurt

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-07-06 12:34

Message:
Logged In: YES 
user_id=31435

Unassigned.  Didn't see any problem on Win98SE, under 2.2.3 
or 2.3b2, using either of the self.serverHost assignments.

If you can, please try under 2.3b2.  On Windows that ships 
with the current version of Tcl/Tk (8.4.3), and some kinds of 
Tcl/Tk Windows shutdown races are said to be fixed in 8.4.3.

Question:  How do you start this program?  With python.exe 
or with pythonw.exe?  The only known workaround for some 
kinds of previous Tcl/Tk shutdown races was to start the 
program with pythonw.exe.  So if you haven't tried 
pythonw.exe, try it and see whether the problem goes away.  
If it does, it's almost certainly a bug in Tcl/Tk.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=766669&group_id=5470