----- Original Message -----
From: "Brian Warner"
glyph@divmod.com writes:
It seems like we can work around this more easily than that, considering that flush and seek are available from Twisted; the file object causing problems in the tests is being returned from the open() method of a FilePath object, if I understand it correctly. FilePath could include the workaround far in advance of Python deciding to.
I'm pretty sure that the real problem we're trying to solve here is caused by a stuck process keeping a .pyd file open. Indeed, if you look at the buildslave's logs, you'll see the exception is as follows:
exceptions.OSError: [Errno 13] Permission denied: 'c:\\buildslave\\win32-win32er\\W32-full2.4-win32er\\Twisted\\twisted\\protocols\\_c_urlarg.pyd'
So changing the way Twisted or its unit tests open a file is just not going to help. What matters is the way python (or.. pyrex?) opens a file.
(for context: the buildbot is currently configured to do SVN checkout/updates into one directory, then copy the tree into a second directory, then run tests on that second directory. This mode='copy' approach uses 'svn update' to minimizes network bandwidth, but at the expense of doubling the disk usage with the extra copy. At the beginning of each build, the buildslave deletes the second directory with a function named rmdirRecursive() that bear provided, which does a chmod() of any mis-permissioned files before deleting them. It was an os.remove() inside this rmdirRecursive which raised the exception).
sysinternals.com should have a utility equivalent to lsof. this is probably the best way to figure out who's doing this.
I've run into a similar problem in the past, under Solaris, using NFS, where a test case spawned off a daemon process which then didn't die when it was supposed to, somehow held on to a file (I think solaris won't let you delete a file that is being used as the backing store for an executable), and that prevented the unlink() from succeeding.
this has to do with how execution works in unices generally. it is *not* a lock - there are no compulsory locks - so while the situation is somewhat (not very, though) similar wrt effects, it's actually completely different. posix semantics dictate that you can not open a file being executed for writing and can not execute if it's open for writing; you can, however, unlink because the inode doesn't get reaped until the refcount drops to 0. this is the case on linux systems. svr4 prohibits the unlink as well, this is an svr4 extension to posix. as an interesting piece of trivia to chuckle about, the errno for these conditions is ETXTBUSY aka Textfile Busy. (this is funny because executables are always binary in practice).
In that environment, I just renamed the top-level directory to something unique, spawned off an 'rm -rf' into the background to delete the old directory if it was possible, then continued on with the next build. If the code had to try too hard to come up with a unique name, it would flag a warning that there might be a stuck process somewhere.
this is a valid technique, except when you're dealing with windows ;) as i mentioned in another post, renames (regardless of how high up in the tree you go) are recursive copy + recursive delete. the delete will fail. furthermore, SHFileOperation recursive deletes bail on first error, afair.
Perhaps we could use something similar here?
no, see above.
Of course, the real fix would be to find a way to let the testing code kill off any stuck processes, but that'll probably be very windows-specific.
on windows, we probably want to use os.abort() and on *nix os.kill(). however, it is probably more interesting to figure out why processes are getting stuck ;) -p