I ended up monkey-patching doRollover to do a number of retries before giving up. (In our case the failures is due to our log browser happening to read the latest changes when logging wants to rollover) (Actually, I implemented a simple QueueHandler and do all file operations from a different logging thread)