-> Do I need a thread for every copy/move operation (large
files via network)?
Nope. That's the beauty of twisted code. I've written this sort of handling multiple packet trains many times. Write a protocol.
Hm, sounds like using FTP and not shutil.move via system's SMB? Or is there already something better in twisted?
If I'd write my own protocol to wrap shutil.move, the operation would be blocking because I don't work on file block level and the protocol doesn't make sense. I'm not feeling like writing file transfer code on file block level. Or do you think it would be worth the hassle?
It is so darned easy to write your own protocols that there's really not much point in reusing like you are attempting to do. Here's a simple example:
from twisted.internet.protocol import Protocol, ClientFactory from sys import stdout from string import unpack
class FileXferProtocol(Protocol): """Handles receiving files. Fire off via the factory.""" def dataReceived(self, data): if not hasattr(self, "filesize"): # First block of the file; get the final size self.filesize = unpack("!L", data[:4]) data = data[4:]
# Receiving blocks of the file self.openfile.write(data) self.recvdlen += len(data) if self.recvdlen == self.filesize: self.openfile.close() self.transport.loseConnection() return
class FileXferFactory(ClientFactory): """Creates protocols to receive files""" protocol = FileXferProtocol
def __init__(self, filename): self.filename = filename self.openfile = file(filename)
if __name__ == '__main__': # First, create the factory protocol and initialize it f = FileXferFactory("tempfile") # Second, connect to a server to get a file (127.0.0.1:9999 in this case) reactor.connectTCP("127.0.0.1", 9999, f) reactor.run() # Call the reactor
You could get hung up on worrying about blocks and reassembly and whatnot (and I have done so in the past), but why bother? About the only thing necessary here is to install some sort of error checking. In the factory code it might be appropriate to do some error checking (make sure the entire file was received) and then to call the next piece of code (the file handler).
On the server side, simply read some bytes and shovel it onto the port. It would look something like this:
class fileXferSend(Protocol): """Sends files""" def connectionMade(self): # The factory already set this up with a file to send self.openfile = open(self.filename) # Somewhat braindead code because the standard object won't give length self.openfile.seek(0,2) self.length = self.openfile.tell() # Start over self.openfile.seek(0, 0) block = self.openfile.read(500) # Prime the system self.transport.write(pack("!L", self.length) + block) reactor.callLater(0, self.sendBlocks) def sendBlocks(self): # Keep sending out blocks of the file until it is done block = self.openfile.read(1000) if len(block): # Keep repeating as long as there is data self.transport.write(block) reactor.callLater(0, self.sendBlocks) else: self.openfile.close() return
This works very easily because TCP already handles packet assembly and ordering and checksums on all the blocks. So all you have to do is shovel the data across the network. At least theoretically, you don't even have to keep track of file length. Also, you could alternatively simply prepend a byte at the beginning of the file. The byte could be say "C" for a continuation or "E" for "eof of file". The code would work almost the same. Or you could be very careful about monitoring the error codes that come out of connectionLost to be sure whether it was a "normal" session close from the host or a dropped session (error), and don't bother implementing ANY sort of "find the end of the file" stuff.
So not using DOM but per-line XML handling. Not that convenient, because I use only small bits of all the XML data that are spread all over the files, but I guess it would become better twisted code.
Perhaps. I'm not really familiar with minidom all that well. The above code contains a perfect example of what I mean. If I had set the parameter in the file.read() command to -1 or omitted it, then the .read() function would have dumped the entire file into a string. Although this may have serious memory implications, the basic problem is that it will also block twisted while it is reading the entire file into memory. Instead, the above code reads a chunk of file, sends a packet, and then returns to the reactor in a short loop (reactor.callLater(0, ...)) which allows the reactor to intersperse calls with other events.
That's perhaps a general problem with twisted: There are great solutions for everything, but you need to know them in detail to know which fits your problem. Or you must know how you should reshape your problem to fit in some twisted solution...
Did you ever notice the same problem with basically every other framework out there? I suggested that you read up on flows for a specific reason. Flows allow you to do what you are suggesting in a very twisted way...you can sequence your XML procedure so that you break it up into short bits of execution that by themselves are effectively non-blocking. That is the "twisted way". Flows let you manage this situation when you have a full blown state machine, not just a linear sequence of steps. BUT, the documentation for Flows and Deferred's is really good at explaining how to break your code up into small non-blocking pieces. So I wasn't really pushing you to USE the Flows module, but to use the concepts that are in it (just read the introductory parts to get the idea).
I know twisted "does it all"(TM), but was is "it"? ;-)
That is the trouble with frameworks. WxWidgets is one of the best GUI frameworks available that works well with Python (via wxPython). But interestingly enough, wxWidgets includes it's own sockets library! However, before you ask, it is not easy to get wxPython (and wxWidgets) to play well with twisted. It is also reactor based, but unlike Python, the reactor in wxWidgets is very unfriendly to all other reactor-based systems.
I'm just trying to write a simple directory watcher (I need this at every corner of my app), Patrick Lauber wrote an answer to my initial question on that, but that wasn't really what I needed.
It works so far that it calls a deferred callback if it gets a notify on a new/changed file, but only once; next time I get an "AlreadyCalledError" - looks like I don't yet understand deferreds.
Common error. A deferred is a promise to call back at some time in the future, but only once (no more, no less)! Quite often, reactor.callLater(0, xxx) is what you want to do. On the face of it, reactor.callLater() appears to be a timer mechanism. But what happens if you call it with the twisted idiom reactor.callLater(0, function, parameters)?
This is a very common twisted idiom. What it does is schedule another function to run immediately when the reactor is allowed to schedule (assuming that there aren't several more functions that are already ready to run... otherwise it waits in line). And (subject to buffer limits on pending calls), you can call a function via the reactor as many times as you want.
Otherwise, your code can call back again and receive another deferred, and eventually another callback. Also, you may be looking instead for "deferredList". For instance, let's say that you are processing a list of files in parallel. In threading-based code, you'd fire off a thread for each file and then wait for each one to return (or perhaps never wait). In deferred's, you'd do something similar:
dlist = list() for i in filelist: dlist.append(handleFile(i) # handleFile returns a deferred return defer.deferredList(dlist)
This routine will return a single Deferred, but the callback results will be a list of the results (and their errback/callback status) from ALL of the handleFile() calls.
At the moment it's inherited from pb.Root, because I'll need it to run remotely sometimes, but perhaps it would be better to use a service or something else -- it should run "all the time" if not stopped and call a callback for every file. I attached the file, perhaps someone can point out my biggest mistakes?
pb is useful if you intend on using the Perspective Broker in the future for twisted's own version of RPC's. If not, it is probably wise to stay away. PB makes it very easy to refactor your code into PB form later on if you so desire. My only problem with it is that you REALLY need to control both ends of the pipe and you have to live within the limitations of TCP/IP (a big limit in the certain P2P code which is better off with very light weight RPC's).
first but once you get used to it, deferred's seem just well...obvious.
I hope to get into that higher state of mind soon. ;-)
Everywhere that you anticipate your code blocking on a procedure call, the code itself needs to return a deferred early on (before it blocks). Then later on, it uses the deferred to pass a result. Frequently, you will have bits of code that read something like this:
d = defer.Deferred() reactor.callLater(0.00001, nextStep, d) return d
def nextStep(d): ...does something... d.callback(real return)
Then the caller commonly does something like: state.x = <state is a utility class to pass around function state> d=callDeferredCode(xxx) d.addcallback(responseHandler, state) return
def responseHandler(response, state): ...
This totally decouples the two routines. Essentially, the calling function and the called function coordinate an orderly shutdown of the calling function's code. Then the callee goes about it's business of running some lengthy function (perhaps waiting on network transmissions) before finally returning with a value in hand. The caller then picks up via the second function and the saved state.
This pattern is a bit ugly but at least it is reasonably readable and it gets around so many ugly details. Once you've written a couple of these, you'll start to think about when and where and how to place the deferred and reactor.callLater calls appropriately. At first, it's just a bit of a challenge wrapping your head around the concept of continuations.
Oh...and the state thing...this gets mentioned once in a while and once you use it, it is highly intuitive but new users frequently miss the concept. First, create a "utility class":
class Utility: pass
What good is an empty class? Plenty! Within a class, you can always use the self object for this. But outside of that, use the utility class as a temporary storage bin with named slots. By this I mean,
state = Utility()
state.filename = "the file I don't want to forget about" state.status = "The number I'll need later"
Also one other thing...once you create a deferred (defer.Deferred), you can chain off of it as much as you want in both the caller and callee. For instance, the callee may not bother creating the deferred but may instead make calls to a deeper function and simply addCallback() before returning the SAME deferred variable to the caller. Then when the deferred actually fires, it can pre-proces the returned results before returning them to the top-level caller.
Clear as much, right? Well, this situation happens for instance if you have a function to clean up/process the raw results from a network I/O call before returning the answers to a higher level. For instance, if you are writing your own RPC handler to use UDP packets (which I've done), the lowest level is responsible for handling network I/O. The next level up is responsible for detecting and handling retransmissions. The next level up is responsible for splitting/concatenating data that is too big to fit inside a single packet. And the next level up is responsible for doing a version of pickle/unpickle. So that the higher level routines communicate essentially with "rpcSend(method, param1, param2, param3...) calls while the lower level routines completely obscure the details (and are in turn obscured from the lower level details of the protocol).
I enjoyed being able to switch the logging output, e.g. from file to database or email per config file without the need to go into the code. I don't feel like re-inventing the wheel, but as Glyph pointed out, the config syntax of standard logging is just ugly and messy; the config syntax of log4[j|perl|net] ist much more logical. Perhaps someone should write a log4twisted module...
More likely, just a log4python, with sufficient room that log4twisted doesn't really require too much. For instance, log4python can use the .next() call to iterate over log entries (when reading from it). twisted code will just use this interface (instead of .dumpEntireLog) to sanely read the log in chunks.