Summary or failing regression tests...

Here is a list of the regrtests that are currently failing for me on Linux32 and Win32. Is anybody looking at test_mmap, test_userstring, test_winreg, and test_winreg2? Skip, You and Jeremy(?) were talking about test_posixpath on Linux. It looks like the sames errors you were talking about occur on Win32. I didn't look too closely, can you? Fredrik, are you aware of the test_sre failure on Win32? Am I doing something wrong to be getting this? On Linux32: - test_fork1 This fails/hangs/crashes inconsistently because (some little bird told me) the fork stuff and threads are incompatible. Am I right? Now that we are turning threads on by default what is the proper thing to do here? SHould test_fork1 be changed to skip if threads are enabled? On Win32: - test_mmap one liner: test test_mmap crashed -- exceptions.WindowsError : [Errno 6] The handle is invalid full output: Position of foo: 1.0 pages Length of file: 2.0 pages Contents of byte 0: '\000' Contents of first 3 bytes: '\000\000\000' Modifying file's content... Contents of byte 0: '3' Contents of first 3 bytes: '3\000\000' Contents of second page: foobar Regex match on mmap (page start, length of match): 1.0 6 Seek to zeroth byte Seek to 42nd byte Seek to last byte Try to seek to negative position... Try to seek beyond end of mmap... Try to seek to negative position... Attempting resize() Traceback (most recent call last): File "..\Lib\test\test_mmap.py", line 114, in ? test_both() File "..\Lib\test\test_mmap.py", line 100, in test_both m.resize( 512 ) WindowsError: [Errno 6] The handle is invalid - test_posixpath one liner: test test_posixpath failed -- Writing: 'error!', expected: 'No err' full output: error! evaluated: posixpath.commonprefix(["/home/swenson/spam", "/home/swen/spam"]) should be: /home returned: /home/swen error! evaluated: posixpath.commonprefix(["/home/swen/spam", "/home/swen/eggs"]) should be: /home/swen returned: /home/swen/ 2 errors. - test_re one liner: test test_re failed -- Writing: '=== Failed incorrectly', expected: "('abc', 'abc', 0, 'fou" full output: Running tests on re.search and re.match Running tests on re.sub Running tests on symbolic references Traceback (most recent call last): File "..\Lib\test\test_re.py", line 122, in ? raise TestFailed, "symbolic reference" test_support.TestFailed: symbolic reference discussion: This has been discussed a litle by Mark Favas and Jeremy with no conclusion. http://www.python.org/pipermail/python-dev/2000-July/013963.html - test_sre one liner: test test_sre failed -- Writing: '=== Failed incorrectly', expected: "===grouping error ('(" full output: Running tests on sre.search and sre.match Running tests on sre.sub Running tests on symbolic references Traceback (most recent call last): File "..\Lib\test\test_sre.py", line 124, in ? raise TestFailed, "symbolic reference" test_support.TestFailed: symbolic reference discussion: This has been discussed a litle by Mark Favas and Jeremy with no conclusion http://www.python.org/pipermail/python-dev/2000-July/013963.html - test_userstring one liner: test test_userstring failed -- Writing: "'a'", expected: '' full output: 'a' <built-in method isalpha of string object at 007E1DF0> <class exceptions.TypeError at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'A' <built-in method isalpha of string object at 00812CF0> <class exceptions.TypeError at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> '\012' <built-in method isalpha of string object at 00812CC0> <class exceptions.TypeError at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'abc' <built-in method isalpha of string object at 008118F0> <class exceptions.TypeError a t 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'aBc123' <built-in method isalpha of string object at 00812990> <class exceptions.TypeErro r at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'abc\012' <built-in method isalpha of string object at 00812C60> <class exceptions.TypeErr or at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'a' <built-in method isalnum of string object at 007E1DF0> <class exceptions.TypeError at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'A' <built-in method isalnum of string object at 00812CF0> <class exceptions.TypeError at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> '\012' <built-in method isalnum of string object at 00812CC0> <class exceptions.TypeError at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> '123abc456' <built-in method isalnum of string object at 00812930> <class exceptions.TypeE rror at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'a1b3c' <built-in method isalnum of string object at 00812900> <class exceptions.TypeError at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'aBc000 ' <built-in method isalnum of string object at 008128D0> <class exceptions.TypeErr or at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> 'abc\012' <built-in method isalnum of string object at 00812C60> <class exceptions.TypeErr or at 007C5AAC> <> <class exceptions.AttributeError at 007C611C> - test_winreg one liner: test test_winreg failed -- Unread: '\012' full output: (this works when run standalone) - test_winreg2 (crashes) full output: Fatal Python error: PyThreadState_Get: no current thread abnormal program termination Thanks, Trent -- Trent Mick TrentM@ActiveState.com

trent wrote:
Fredrik, are you aware of the test_sre failure on Win32? Am I doing something wrong to be getting this?
hmm. *I* don't get that error (Win32, MSVC 5.0). If everything works as it should, test_re and test_sre should produce what's in output/test_s?re does anyone else see this error on Linux? </F>

as for your other errors (still on Win95, MSVC 5.0):
same here (but a different error code?)
nope.
nope. but are you sure you've updated the output directory lately? the "=== Failed incorrectly" part is expected, the abc stuff is not.
nope.
nope. but again, the expected stuff looks wrong.
nope.
nope -- but the output indicates that you're testing against an old copy of UserString.py
nope.
almost. I got other interesting errors when I had an old winreg.pyd in the path. if I remove that one, the test runs, prints HKEY_CLASSES_ROOT, and then starts consuming 99.99% CPU (not sure if it terminates; I need my cycles for more important stuff...) </F>

Trent Mick wrote:
That works fine to me (--with-threads). Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de

On Wed, Jul 26, 2000 at 12:47:21AM +0000, Peter Schneider-Kamp wrote:
'test_fork1' works for me when run as: ./python Lib/test/regrtest.py but hangs when run as: ./python Lib/test/test_fork1.py or ./python Lib/test/regrtest.py test_fork1 I don't know why? Trent -- Trent Mick TrentM@ActiveState.com

Trent Mick wrote:
On my system there is no problem: nowonder@mobility:~/python/python/dist/src > ./python Lib/test/test_fork1.py nowonder@mobility:~/python/python/dist/src > ./python Lib/test/regrtest.py test_fork1 test_fork1 1 test OK. I am using a (somewhat modified) SuSE 6.4 distribution with an old 2.2.13 kernel. What does your's look like? Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de

On Wed, Jul 26, 2000 at 08:10:40AM +0000, Peter Schneider-Kamp wrote:
[trentm@molotok ~]$ cat /proc/version Linux version 2.2.12-20smp (root@porky.devel.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Mon Sep 27 10:34:45 EDT 1999 RedHat dist Trent -- Trent Mick TrentM@ActiveState.com

Trent Mick wrote:
Could SMP be involved? Do you get the same on a non-SMP system? Do others have the same problem with SMP systems? Peter Linux version 2.2.13 (root@mobility) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Fri Mar 3 00:43:36 CET 2000 -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de

On Wed, Jul 26, 2000 at 06:40:07PM +0000, Peter Schneider-Kamp wrote:
Yup, it looks like that is a possibility. I tried test_fork1 a few times on: [trentm@ladder src]$ cat /proc/version Linux version 2.2.14 (root@ladder.ActiveState.com) (gcc version 2.95.2 19991024 (release)) #1 Mon Mar 6 16:49:17 PST 2000 (a Mandrake box)... and it works fine: [trentm@ladder src]$ ./python Lib/test/test_fork1.py [trentm@ladder src]$ ./python Lib/test/regrtest.py test_fork1 test_fork1 1 test OK. [trentm@ladder src]$ So...... who knows more about SMP issues and why we have them? I am inexperienced and clueless here. Trent -- Trent Mick TrentM@ActiveState.com

Trent Mick writes:
So...... who knows more about SMP issues and why we have them? I am inexperienced and clueless here.
Another possible issue is that at one point configure was using GNU Pth instead of LinuxThreads by default. Since Pth is fairly young, there may be issues there as well. Pth is no longer preferred over LinuxThreads. For non-Linux users: LinuxThreads is the default pthreads implementation on Linux, and GNU Pth is a new "portable" implementation that I understand is very young and unstable. -Fred -- Fred L. Drake, Jr. <fdrake at beopen.com> BeOpen PythonLabs Team Member

Peter Schneider-Kamp writes:
Could SMP be involved? Do you get the same on a non-SMP system? Do others have the same problem with SMP systems?
I mentioned the possibility at our PythonLabs meeting today that this may be related; when I observed problems with test_fork1, it was on an SMP linux box running Mandrake 7.0 with the stock SMP kernel. I have *not* seen the problem pop up on the uniprocessor I have now. I think Barry may have access to an SMP Sparc machine; if so, we'll be checking it out there. -Fred -- Fred L. Drake, Jr. <fdrake at beopen.com> BeOpen PythonLabs Team Member

On Wed, Jul 26, 2000 at 11:05:21PM -0400, Fred L. Drake, Jr. wrote:
I think Barry may have access to an SMP Sparc machine; if so, we'll be checking it out there.
I have also seen test_fork1 failures on BSDI, using a SMP machine, but I haven't tried it on non-SMP (we don't have too many of those). However, all our BSDI kernels are the same, having been built for SMP. Meetings permitting (which is doubtful, today :-() I'll see if I can pin that down. It would, however, be fairly peculiar if test_fork1 breaks on all SMP-supporting systems... I don't really recall a reason for thread semantics to change reliably based on kernel/cpu settings, and it would be silly for them to do so! But I'll admit threads are silly, period ;-) 6-AM-and-preparing-for-a-full-10-hour-day-of-meetings-:-S-ly y'rs, -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

[Thomas Wouters]
Silly? Without threads your clothes would fall off <wink>. I wonder whether the "fork" part is a red herring here. It's extremely rare to see a thread bug that actually requires a multi-headed machine to trigger (in fact, I don't believe I've ever seen one), but the nature of races in flawed threaded code is often such that you're a million times more *likely* to hit a bad timing window on a multi-headed machine than on a time-sliced single-headed box. And this is particularly true under operating systems that are reluctant to switch threads on a singled-headed box. For example, 1.5.2's dreaded invalid_tstate bug had never been reported on a single-headed box until I spent several *hours* hand-crafting an extreme test case to provoke it on one (and that was after I was sure I had located the timing hole "by thought", so knew exactly what I needed to do to trigger it -- 'twas very hard to provoke it on a one-headed box even then!).
6-AM-and-preparing-for-a-full-10-hour-day-of-meetings-:-S-ly y'rs,
So long as you still put in your 18 hours on Python today, we're happy to let you engage in all the recreational activities you like <wink>. generously y'rs - tim

On Thu, Jul 27, 2000 at 12:32:02AM -0400, Tim Peters wrote:
Silly? Without threads your clothes would fall off <wink>.
Clothes ? What clothes ? I'm stuck in meetings all day, remember, and those require a suit and tie. And I have suit nor tie ;)
Actually, I got the impression the 'bug' wasn't only present on multi-headed Linux machines with an SMP kernel, but single-headed Linux machines with an SMP kernel as well. You see, in Linux, the extra logic required for SMP is optional, at compile time. It changes a lot in the scheduler, but I'm not sure if it should be visible from outside. I haven't actually tested it on a UP machine with SMP kernel, though. And anyway, I thought the test_fork1 test tested for fork() behaviour in threads. It spawns a thread, fork()s one of those threads, and tests to see if the other threads still exist after the fork(), in the new child. The entire test is done in Python code, so how the scheduler and/or race conditions come into play isn't too obvious to me. Except of course the whole test if flawed.
6-AM-and-preparing-for-a-full-10-hour-day-of-meetings-:-S-ly y'rs,
So long as you still put in your 18 hours on Python today, we're happy to let you engage in all the recreational activities you like <wink>.
I'll clock the hours I spend thinking about Python in those meetings, so not to worry ;) Dreaming-Python-makes-it-24h-a-day-ly y'rs, -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

[Thomas Wouters]
Four threads, actually.
fork()s one of those threads,
It forks "the main" thread, which is in addition to the four it spawns.
The test uses loops and time.sleep instead of proper synchronization, so its behavior is at best probabilistic, depending entirely on timing accidents (*likely* accidents, but accidents all the same). But I don't have a machine here that supports fork at all, so I can't run it, and I don't know how it fails on the machines it fails on. If it's failing with a thread exiting with a non-zero exit code, then I'd say the timing uncertainties in the *Python* code are irrelevant, and it's the failing platform's fork implementation that's broken.

Thomas Wouters wrote:
Could you be more specific about which aspect of test_fork1.py fails ? After looking at the code it seems that it's not the os.fork() itself that is not working, but some particular combination of using os.fork() on a process with multiple threads running. If the latter is the case, then I'd propose to simply switch off threads for SMP machines (is there a compile time switch we could use for this ?) until we have figured out what causes the problem. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"Fred" == Fred L Drake, Jr <fdrake@beopen.com> writes:
Fred> Barry may have access to an SMP Sparc machine; if so, we'll Fred> be checking it out there. I thought I did, but nope, it's a single processor. What about SF? -Barry

[Barry]
I thought I did, but nope, it's a single processor. What about SF?
In parallel, could someone summarize the symptoms? Exactly how does it fail? Does it fail the same way across all platforms on which it does fail? Does it fail every time on all platforms on which it fails, or fail only some of the time on all platforms on which it fails, or fails some of the time on some of the platforms on which it fails but fails all of the time on the rest of the platforms on which it fails <wink>? If there exists a platform on which it fails but it doesn't fail every time on that platform, that would be strong evidence of a timing hole. Those usually require <gasp!> thought to identify and repair. I'll voluteer to eyeball the code and do some thinking, but not unless the symptoms suggest that's worthwhile. ignorantly y'rs - tim

On Thu, Jul 27, 2000 at 01:17:03AM -0400, Tim Peters wrote:
Here is some code that seems to hang on Linux UP machines. Sometimes it prints PIDS for a while and other times it stops after only a few. I don't know if that counts. I raised this issue over a month ago. Tim, your getting forgetful. :) Neil import threading import os, sys class MyThread(threading.Thread): def start(self): threading.Thread.start(self) def run(self): while 1: if os.fork() == 0: print os.getpid() os._exit(0) os.wait() MyThread().start() MyThread().start() MyThread().start()

[Neil Schemenauer]
Well, that sucks! Does "UP" mean uniprocessor in this context?
I don't know if that counts. I raised this issue over a month ago. Tim, your getting forgetful. :)
Getting? I'm very old. I think. But even if I had a memory, I usually ignore Unix-specific posts (i.e., I can't run code with a .fork(), so can only sit back & admire the perversity of those who both can & do <wink>).

IIRC ActiveState contributed to Perl a version of fork that works on Win32. Has anyone looked at this? Could it be grabbed for Python? This would help heal one of the more difficult platform rifts. Emulating fork for Win32 looks quite difficult to me but if its already done... Neil

On Fri, Jul 28, 2000 at 09:21:01AM +1000, Neil Hodgson wrote:
I just asked Sarathy about this and he direct me to 'perldoc perlfork' (If you have ActivePerl 5.6 installed then you can look it up.) I have attached it here. The emulation of fork() in Win32 is not a perfect solution (e.g. sockets are not dupped, etc.). Trent ------------------- snip --------------------------- NAME perlfork - Perl's fork() emulation SYNOPSIS Perl provides a fork() keyword that corresponds to the Unix system call of the same name. On most Unix-like platforms where the fork() system call is available, Perl's fork() simply calls it. On some platforms such as Windows where the fork() system call is not available, Perl can be built to emulate fork() at the interpreter level. While the emulation is designed to be as compatible as possible with the real fork() at the the level of the Perl program, there are certain important differences that stem from the fact that all the pseudo child "processes" created this way live in the same real process as far as the operating system is concerned. This document provides a general overview of the capabilities and limitations of the fork() emulation. Note that the issues discussed here are not applicable to platforms where a real fork() is available and Perl has been configured to use it. DESCRIPTION The fork() emulation is implemented at the level of the Perl interpreter. What this means in general is that running fork() will actually clone the running interpreter and all its state, and run the cloned interpreter in a separate thread, beginning execution in the new thread just after the point where the fork() was called in the parent. We will refer to the thread that implements this child "process" as the pseudo-process. To the Perl program that called fork(), all this is designed to be transparent. The parent returns from the fork() with a pseudo-process ID that can be subsequently used in any process manipulation functions; the child returns from the fork() with a value of `0' to signify that it is the child pseudo-process. Behavior of other Perl features in forked pseudo-processes Most Perl features behave in a natural way within pseudo-processes. $$ or $PROCESS_ID This special variable is correctly set to the pseudo-process ID. It can be used to identify pseudo-processes within a particular session. Note that this value is subject to recycling if any pseudo-processes are launched after others have been wait()-ed on. %ENV Each pseudo-process maintains its own virtual enviroment. Modifications to %ENV affect the virtual environment, and are only visible within that pseudo-process, and in any processes (or pseudo-processes) launched from it. chdir() and all other builtins that accept filenames Each pseudo-process maintains its own virtual idea of the current directory. Modifications to the current directory using chdir() are only visible within that pseudo-process, and in any processes (or pseudo-processes) launched from it. All file and directory accesses from the pseudo-process will correctly map the virtual working directory to the real working directory appropriately. wait() and waitpid() wait() and waitpid() can be passed a pseudo-process ID returned by fork(). These calls will properly wait for the termination of the pseudo-process and return its status. kill() kill() can be used to terminate a pseudo-process by passing it the ID returned by fork(). This should not be used except under dire circumstances, because the operating system may not guarantee integrity of the process resources when a running thread is terminated. Note that using kill() on a pseudo-process() may typically cause memory leaks, because the thread that implements the pseudo-process does not get a chance to clean up its resources. exec() Calling exec() within a pseudo-process actually spawns the requested executable in a separate process and waits for it to complete before exiting with the same exit status as that process. This means that the process ID reported within the running executable will be different from what the earlier Perl fork() might have returned. Similarly, any process manipulation functions applied to the ID returned by fork() will affect the waiting pseudo-process that called exec(), not the real process it is waiting for after the exec(). exit() exit() always exits just the executing pseudo-process, after automatically wait()-ing for any outstanding child pseudo-processes. Note that this means that the process as a whole will not exit unless all running pseudo-processes have exited. Open handles to files, directories and network sockets All open handles are dup()-ed in pseudo-processes, so that closing any handles in one process does not affect the others. See below for some limitations. Resource limits In the eyes of the operating system, pseudo-processes created via the fork() emulation are simply threads in the same process. This means that any process-level limits imposed by the operating system apply to all pseudo-processes taken together. This includes any limits imposed by the operating system on the number of open file, directory and socket handles, limits on disk space usage, limits on memory size, limits on CPU utilization etc. Killing the parent process If the parent process is killed (either using Perl's kill() builtin, or using some external means) all the pseudo-processes are killed as well, and the whole process exits. Lifetime of the parent process and pseudo-processes During the normal course of events, the parent process and every pseudo-process started by it will wait for their respective pseudo-children to complete before they exit. This means that the parent and every pseudo-child created by it that is also a pseudo-parent will only exit after their pseudo-children have exited. A way to mark a pseudo-processes as running detached from their parent (so that the parent would not have to wait() for them if it doesn't want to) will be provided in future. CAVEATS AND LIMITATIONS BEGIN blocks The fork() emulation will not work entirely correctly when called from within a BEGIN block. The forked copy will run the contents of the BEGIN block, but will not continue parsing the source stream after the BEGIN block. For example, consider the following code: BEGIN { fork and exit; # fork child and exit the parent print "inner\n"; } print "outer\n"; This will print: inner rather than the expected: inner outer This limitation arises from fundamental technical difficulties in cloning and restarting the stacks used by the Perl parser in the middle of a parse. Open filehandles Any filehandles open at the time of the fork() will be dup()-ed. Thus, the files can be closed independently in the parent and child, but beware that the dup()-ed handles will still share the same seek pointer. Changing the seek position in the parent will change it in the child and vice-versa. One can avoid this by opening files that need distinct seek pointers separately in the child. Forking pipe open() not yet implemented The `open(FOO, "|-")' and `open(BAR, "-|")' constructs are not yet implemented. This limitation can be easily worked around in new code by creating a pipe explicitly. The following example shows how to write to a forked child: # simulate open(FOO, "|-") sub pipe_to_fork ($) { my $parent = shift; pipe my $child, $parent or die; my $pid = fork(); die "fork() failed: $!" unless defined $pid; if ($pid) { close $child; } else { close $parent; open(STDIN, "<&=" . fileno($child)) or die; } $pid; } if (pipe_to_fork('FOO')) { # parent print FOO "pipe_to_fork\n"; close FOO; } else { # child while (<STDIN>) { print; } close STDIN; exit(0); } And this one reads from the child: # simulate open(FOO, "-|") sub pipe_from_fork ($) { my $parent = shift; pipe $parent, my $child or die; my $pid = fork(); die "fork() failed: $!" unless defined $pid; if ($pid) { close $child; } else { close $parent; open(STDOUT, ">&=" . fileno($child)) or die; } $pid; } if (pipe_from_fork('BAR')) { # parent while (<BAR>) { print; } close BAR; } else { # child print "pipe_from_fork\n"; close STDOUT; exit(0); } Forking pipe open() constructs will be supported in future. Global state maintained by XSUBs External subroutines (XSUBs) that maintain their own global state may not work correctly. Such XSUBs will either need to maintain locks to protect simultaneous access to global data from different pseudo-processes, or maintain all their state on the Perl symbol table, which is copied naturally when fork() is called. A callback mechanism that provides extensions an opportunity to clone their state will be provided in the near future. Interpreter embedded in larger application The fork() emulation may not behave as expected when it is executed in an application which embeds a Perl interpreter and calls Perl APIs that can evaluate bits of Perl code. This stems from the fact that the emulation only has knowledge about the Perl interpreter's own data structures and knows nothing about the containing application's state. For example, any state carried on the application's own call stack is out of reach. Thread-safety of extensions Since the fork() emulation runs code in multiple threads, extensions calling into non-thread-safe libraries may not work reliably when calling fork(). As Perl's threading support gradually becomes more widely adopted even on platforms with a native fork(), such extensions are expected to be fixed for thread-safety. BUGS * Having pseudo-process IDs be negative integers breaks down for the integer `-1' because the wait() and waitpid() functions treat this number as being special. The tacit assumption in the current implementation is that the system never allocates a thread ID of `1' for user threads. A better representation for pseudo-process IDs will be implemented in future. * This document may be incomplete in some respects. AUTHOR Support for concurrent interpreters and the fork() emulation was implemented by ActiveState, with funding from Microsoft Corporation. This document is authored and maintained by Gurusamy Sarathy <gsar@activestate.com>. SEE ALSO the section on "fork" in the perlfunc manpage, the perlipc manpage -- Trent Mick TrentM@ActiveState.com

Neil Hodgson wrote:
This would indeed be a *very* useful addition and help porting os.fork() applications to Win32. (Ok, in the long run they would have to be converted to multi-threaded apps due to the process creation overhead on Win32, but for short term porting to Win32 this would be a Cool Thing, IMHO.) Can this code be grabbed from somewhere ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

I have only one word: yuck! Portable Python code should not rely on fork. --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)

Guido van Rossum wrote:
I wasn't talking about "portable" code, but about "porting" code to Win32. I happen to use an application which manages a few processes and spawns these using .fork(). It would nice to have a .fork() like API on Win32 to experiment with. Anyway, I would already be happy if I could just look at the code from ActiveState... if it's Perl all the way I probably don't want to look any further into this ;-) BTW, I'm not too familiar with IPC on Win32. What would be the best strategy to this on the Windows platforms ? I remember that Skip once posted a comparison of Unix Domain sockets and TCP Sockets on Unix which showed that UD sockets are much faster than TCP sockets. On Win32 these don't exist and I suppose that TCP sockets are too slow for my server. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Fri, Jul 28, 2000 at 12:21:08PM +0200, M . -A . Lemburg wrote:
It is all in ActivePerl. You can download the source code: http://www.activestate.com/Products/ActivePerl/Download.html Besides, I would guess (and I *am* guessing, I don't know this) that the Perl fork stuff is fairly closely tied to Perl, i.e. it may not be easy at all to yank it out and plug it into Python. However, I echo the Bill's and Guido's sentiments. Having a hacked emulation of fork encourages people to use it. Once someone has their favorite UNIX app it running on Win32 with the fake-fork they will have little incentive to port it properly using threads. There will then be calls to improve to Win32 fork implementation... and that is the wrong support path. Trent -- Trent Mick TrentM@ActiveState.com

Trent Mick wrote:
Thanks.
You're probably right... :-/ BTW, (pardon my ignorance) what is the most portable way to do the equivalent of a os.system("cmd &") as native OS API call ? [On Unix, "cmd &" starts a new process which runs in the background and detached from the calling process.] I've looked at .execve and .spawnve, but they both replace the current process. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

mal wrote:
on windows, spawn(P_NOWAIT) does what you want. here's an example from the eff-bot guide: # # os-spawn-example-2.py import os import string def run(program, *args, **kw): # find executable mode = kw.get("mode", os.P_WAIT) for path in string.split(os.environ["PATH"], os.pathsep): file = os.path.join(path, program) + ".exe" try: return os.spawnv(mode, file, (file,) + args) except os.error: pass raise os.error, "cannot find executable" run("python", "hello.py", mode=os.P_NOWAIT) </F>

Fredrik Lundh wrote:
Cool, so os.spawnve(os.P_NOWAIT, ...) looks like a portable alternative to os.fork() for the case where you do not rely on the parent process resources being available in the child process. Next, I'll have to find out how to kill a process given its process ID under Windows... wouldn't it be possible to write an emulation of os.kill() for Win32 platforms too ? (there's a SysInfo tool for Windows which says that OpenProcess(PROCESS_TERMINATE, FALSE, pid); will do the trick -- not sure if that works as expected though). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Next, I'll have to find out how to kill a process given its process ID under Windows... wouldn't it be possible to
hprocess = OpenProcess( accessFlags, bInherit, pid ); TerminateProcess(hprocess, exitCode); The hard bit on Win32 is finding the PID when you only have the process name (for example) but it sounds like you dont have that specific problem... Mark.

Mark Hammond wrote:
Would it make sense to add something like this to posixmodule.c as emulation of the Unix kill() on Win32 ? Here's the manpage for reference: SYNOPSIS #include <sys/types.h> #include <signal.h> int kill(pid_t pid, int sig); DESCRIPTION The kill system call can be used to send any signal to any process group or process. If pid is positive, then signal sig is sent to pid. If pid equals 0, then sig is sent to every process in the process group of the current process. If pid equals -1, then sig is sent to every process except for the first one, from higher numbers in the process table to lower. If pid is less than -1, then sig is sent to every process in the process group -pid. If sig is 0, then no signal is sent, but error checking is still performed. RETURN VALUE On success, zero is returned. On error, -1 is returned, and errno is set appropriately.
Not really, since my server manages its own set of processes and knows the different PIDs. It keeps track of what the processes currently do and terminates the ones that act in unexpected ways, e.g. due to programming errors. On Unix this results in a high availability bug tolerant server application which allows running user written code. My future goal would be porting it to Win2k, provided the needed APIs are available (or can be emulated in some way). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

David Ascher wrote:
What about porting os.kill() to Windows (see my other post with changed subject line in this thread) ? Wouldn't that make sense ? (the os.spawn() APIs do return PIDs of spawned processes, so calling os.kill() to send signals to these seems like a feasable way to control them) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Signals are a bit of a problem on Windows. We can terminate the thread mid-execution, but a clean way of terminating a thread isn't obvious. I admit I didnt really read the long manpage when you posted it, but is a terminate-without-prejudice option any good? Mark.

eek - a bit quick off the mark here ;-]
Signals are a bit of a problem on Windows. We can terminate the thread mid-execution, but a clean way of terminating a thread isn't obvious.
thread = process - you get the idea!
terminate-without-prejudice option any good?
really should say
terminate-without-prejudice only version any good?
Mark.

Mark Hammond wrote:
Well for one you can use signals for many other things than just terminating a process (e.g. to have it reload its configuration files). That's why os.kill() allows you to specify a signal. The usual way of terminating a process on Unix from the outside is to send it a SIGTERM (and if that doesn't work a SIGKILL). I use this strategy a lot to control runaway client processes and safely shut them down: On Unix you can install a signal handler in the Python program which then translates the SIGTERM signal into a normal Python exception. Sending the signal then causes the same as e.g. hitting Ctrl-C in a program: an exception is raised asynchronously, but it can be handled properly by the Python exception clauses to enable safe shutdown of the process. For background: the client processes in my application server can execute arbitrary Python scripts written by users, i.e. potentially buggy code which could effectively hose the server. To control this, I use client processes which do the actual exec code and watch them using a watchdog process. If the processes don't return anything useful within a certain timeout limit, the watchdog process sends them a SIGTERM and restarts a new client. Threads would not support this type of strategy, so I'm looking for something similar on Windows, Win2k to be more specific. Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[Marc writes]
I understand this. This is why I was skeptical that a "terminate-without-prejudice" only version would be useful. I _think_ this fairly large email is agreeing that it isn't of much use. If so, then I am afraid you are on your own :-( Mark.

On Thu, Jul 27, 2000 at 01:17:03AM -0400, Tim Peters wrote:
Here is what I have found: Machine: [trentm@molotok ~/main/contrib/python.build/dist/src]$ cat /proc/version Linux version 2.2.12-20smp (root@porky.devel.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Mon Sep 27 10:34:45 EDT 1999 Note that this is an SMP machine. General symptoms: test_fork1 did *not* fail for me all the time. In fact it seemed, in this run of testing, to pass fine a number of time in a row and then some magical switch flipped and now it fails every time. I don't know what the 'switch' case is, nor do I know how to flip it on and off. This failing everytime is modulo debugging print statements that I have put in test_fork1.py. This indicates that it is a timing issue. Instumented test_fork1.py: -------------------------------------------------------------------------- import os, sys, time, thread try: os.fork except AttributeError: raise ImportError, "os.fork not defined -- skipping test_fork1" LONGSLEEP = 2 SHORTSLEEP = 0.5 NUM_THREADS = 4 alive = {} stop = 0 def f(id): while not stop: alive[id] = os.getpid() print 'thread %s: pid=%s' % (str(id), str(alive[id])) try: time.sleep(SHORTSLEEP) except IOError: pass def main(): print 'start main' for i in range(NUM_THREADS): thread.start_new(f, (i,)) print 'before sleep' time.sleep(LONGSLEEP) print 'after sleep (threads should be started now)' a = alive.keys() a.sort() assert a == range(NUM_THREADS) prefork_lives = alive.copy() print 'before fork' cpid = os.fork() print 'after fork' if cpid == 0: print 'child: start' # Child time.sleep(LONGSLEEP) n = 0 for key in alive.keys(): if alive[key] != prefork_lives[key]: n = n+1 print 'child: done, exit_value=%d' % n os._exit(n) else: print 'parent: start' # Parent spid, status = os.waitpid(cpid, 0) print 'parent: done waiting for child(pid=%d,status=%d)' %\ (spid, status) assert spid == cpid assert status == 0, "cause = %d, exit = %d" % (status&0xff, status>>8) global stop # Tell threads to die print 'parent: tell threads to die' stop = 1 time.sleep(2*SHORTSLEEP) # Wait for threads to die print 'parent: done (expect threads to be dead by now, hack)' main() -------------------------------------------------------------------------- A couple of test runs: *** This test run passed: [trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py start main before sleep thread 0: pid=26416 thread 1: pid=26417 thread 2: pid=26418 thread 3: pid=26419 thread 0: pid=26416 thread 1: pid=26417 thread 2: pid=26418 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 3: pid=26419 thread 1: pid=26417 thread 0: pid=26416 after sleep (threads should be started now) before fork after fork thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 parent: start after fork child: start thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 thread 1: pid=26417 thread 3: pid=26419 thread 0: pid=26416 thread 2: pid=26418 child: done, exit_value=0 parent: done waiting for child(pid=26420,status=0) parent: tell threads to die parent: done (expect threads to be dead by now, hack) *** This test run seg faulted but completed: [trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py start main before sleep thread 0: pid=26546 thread 1: pid=26547 thread 2: pid=26548 thread 3: pid=26549 thread 1: pid=26547 thread 3: pid=26549 thread 2: pid=26548 thread 0: pid=26546 thread 2: pid=26548 thread 0: pid=26546 thread 1: pid=26547 thread 3: pid=26549 thread 3: pid=26549 thread 1: pid=26547 thread 2: pid=26548 thread 0: pid=26546 after sleep (threads should be started now) before fork after fork parent: start after fork child: start Segmentation fault (core dumped) [trentm@molotok ~/main/contrib/python.build/dist/src]$ child: done, exit_value=0 [trentm@molotok ~/main/contrib/python.build/dist/src]$ *** This test hung on the last statement: [trentm@molotok ~/main/contrib/python.build/dist/src]$ ./python Lib/test/test_fork1.py start main before sleep thread 0: pid=26753 thread 1: pid=26754 thread 2: pid=26755 thread 3: pid=26756 thread 2: pid=26755 thread 3: pid=26756 thread 0: pid=26753 thread 1: pid=26754 thread 0: pid=26753 thread 2: pid=26755 thread 3: pid=26756 thread 1: pid=26754 thread 0: pid=26753 thread 3: pid=26756 thread 2: pid=26755 thread 1: pid=26754 after sleep (threads should be started now) before fork thread 0: pid=26753 after fork thread 2: pid=26755 parent: start thread 3: pid=26756 thread 1: pid=26754 after fork child: start thread 0: pid=26753 thread 3: pid=26756 thread 1: pid=26754 thread 2: pid=26755 thread 0: pid=26753 thread 3: pid=26756 thread 1: pid=26754 thread 2: pid=26755 thread 0: pid=26753 thread 3: pid=26756 thread 1: pid=26754 thread 2: pid=26755 thread 0: pid=26753 child: done, exit_value=0 parent: done waiting for child(pid=26757,status=0) Those are the only three run cases that I get. Trent -- Trent Mick TrentM@ActiveState.com

[Trent Mick, rallies to the cry for a summary of symptoms, which I'll summarize as On Linux version 2.2.12-20smp (root@porky.devel.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP test_fork1 has one of three outcomes, varying across runs: 1. no problem 2. segfault 3. hang ] Thank you! In no case did the test fail on the particular thing it's *trying* to test: that after a fork, spawned threads don't show up in the child. So it's failing in unanticipated ways (when it does fail), and, to my eyes, ways that are very unlikely to be Python's fault. Everyone pursuing the other "fork bug" please note that test_fork1 doesn't import threading or use any mutexes -- it just spawns threads, forks once, and .sleeps() a lot. As with the other bug, would be interesting to recode test_fork1 in C and see whether it still segfaults and/or hangs. Should be easier to do for this test than the other one, since the *only* thread gimmick test_fork1 uses is thread.start_new(). We'll either discover that it still fails, in which case it's clearly not something Python caused and we'll have something pretty simple to pass on to the platform folks; or that it doesn't, in which case it's *really* interesting <wink>. Does anyone have test_fork1 failures on other flavors of SMP?

"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> unlikely to be Python's fault. Everyone pursuing the other TP> "fork bug" please note that test_fork1 doesn't import threading TP> or use any mutexes -- it just spawns threads, forks once, and TP> .sleeps() a lot. I don't think the "other" "fork bug" uses any more thread gimmicks than the bug you're considering. The test script that Neil posted did use the threading module, but it works just as well with the thread module. It only uses start_new_thread. The big difference between the two bugs is that the one Neil posted results in deadlock rather than a segfault. So they may well be completely different. For both bugs, though, a mutex and a condition variable are being use: The interpreter lock is being acquired and released in both cases. My current theory is that Python isn't dealing with the interpreter lock correctly across a fork. If some thread other than the one calling fork holds the interpreter lock mutex, the lock will be held forever in the child thread. If the child thread isn't making progress, the parent thread won't make progress either. Jeremy Here's a simplified test case: import os import thread def f(): while 1: if os.fork() == 0: print "%s %s" % (thread.get_ident(), os.getpid()) os._exit(0) os.wait() thread.start_new_thread(f, ()) thread.start_new_thread(f, ()) f()

[Jeremy Hylton]
... For both bugs, though, a mutex and a condition variable are being use:
Oh ya -- now that you mention it, I wrote that code <wink> -- but more than 7 years ago! How could a failure have gone undetected for so long?
Let's flesh out the most likely bad case: the main thread gets into posix_fork one of the spawned threads (say, thread 1) tries to acquire the global lock thread 1 gets into PyThread_acquire_lock thread 1 grabs the pthread mutex guarding "the global lock" the main thread executes fork() while thread 1 holds the mutex in the original process, everything's still cool: thread 1 still exists there, and it releases the mutex it acquired (after seeing that the "is it locked?" flag is set), yadda yadda yadda. but in the forked process, things are not cool: the (cloned) mutex guarding the global lock is still held What happens next in the child process is interesting <wink>: there is only one thread in the child process, and it's still in posix_fork. There it sets the main_thread and main_pid globals, and returns to the interpreter loop. That the forked pthread_mutex is still locked is irrelevant at this point: the child process won't care about that until enough bytecodes pass that its sole thread offers to yield. It doesn't bash into the already-locked cloned pthread mutex until it executes PyThread_release_lock as part of offering to yield. Then the child hangs. Don't know about this specific implementation, but phtread mutex acquires were usually implemented as busy-loops in my day (which is one reason Python locks were *not* modeled directly as pthread mutexes). So, in this scenario, the child hangs in a busy loop after an accidental amount of time passes after the fork. Matches your symptoms? It doesn't match Trent's segfault, but one nightmare at a time ...

Trent Mick wrote:
Does this mean that a stock Python 1.6/2.0 interpreter will not properly do fork() on Linux32 (even when using threads) ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Wed, Jul 26, 2000 at 10:11:40AM +0200, M . -A . Lemburg wrote:
As I said: *some little bird told me*. (I think it was David.) I don't know, Marc-Andre. I am just waving a flag. Ignorantly yours, Trent -- Trent Mick TrentM@ActiveState.com
participants (16)
-
bwarsaw@beopen.com
-
David Ascher
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Guido van Rossum
-
Jeremy Hylton
-
M.-A. Lemburg
-
Mark Hammond
-
Neil Hodgson
-
Neil Schemenauer
-
Peter Schneider-Kamp
-
Peter Schneider-Kamp
-
Thomas Wouters
-
Tim Peters
-
Trent Mick
-
Trent Mick