On Saturday 26 April 2008, inhahe wrote:
what is the advantage of using a single-threaded server?
If you use threads, your code can be interrupted at any place, except when you tell it not to (locking). If you use deferreds, your code can be interrupted only at exactly those places you have indicated. This makes it much easier to write correct code.
i figured it makes it more scalable because there's too much overhead to have a thread for each user when you have many simultaneous users. but a friend i'm talking to now says that using i/o blocking threads is perfectly scalable for a large number of simultaneous users.
It depends on a lot of things. For example, would you use thread pools or one thread per user? And how many users are you talking about? 100, 1000, 10000, ...? It also depends on how efficient your OS is in handling threads. I remember having to compile a differently configured Linux kernel because it would run out of processes when creating about 1000 threads, but this was over 5 years ago, before NPTL, so this may no longer be an issue. If you have a multi core / multi CPU machine, running multiple threads could spread the workload over different cores. For Python this doesn't really help though: the Python VM has the Global Interpreter Lock, which effectively means that unless you implement long-running operations in an extension written in C, there will only be one thread making progress at any time. So if you want to use multiple cores effectively in Python, you have to design your application to consist of separate communicating processes.
if that's true i can only see a disadvantage in using a single-threaded server -- having to use deferreds and stuff to make things asynchronous
The main advantage in my opinion is that it is much easier to write correct asynchronous code than correct threaded code. If you write threaded code and overlook one place it can be interrupted, you have a bug. If you write asynchronous code and overlook one place it should be interruptable, you get worse latency, but it is still correct. Because the points at which different tasks are interleaved are much more predictable in asynchronous code, there is a reasonable chance that if your code passes your unit tests, it is actually correct. For threaded code, it's not uncommon that code passes its unit tests, but starts giving wrong results as soon as the server is put under high load. It may sound strange that I'm saying asynchronous code is easier to write, since that is probably not the experience you have when you start doing it. But if you're writing a complex threaded application, you typically end up assigning each thread its own area of responsibility and getting its inputs and outputs from other threads using event queues. If you don't do this, threads will run through your application in unpredictable ways as the application grows in complexity and even assuming you have proper locking over all shared data, you can run into deadlocks if you don't always lock things in the same order (thread 1 locks A and then B, thread 2 locks B and then A -> possible deadlock). So you end up with a threaded application design where each thread runs in an isolated pocket, getting data from an event queue, processing it and then inserting it in another event queue. This is not all that different from the asynchronous situation in which you get an event from a reactor callback, do some processing and then register another callback. As an aside, I think one of the problems with threads is that to write a piece of code correctly, you have to take into account which threads exist in your application. This means it is no longer possible to know whether for example a class is correct by looking at it in isolation. One of the advantages of object oriented programming is that you only have to care about whether a class correctly implements its interface, not how that class is used in an application. But when threading, this is no longer the case: a class that is correct in single threaded use can be incorrect in multi threaded use and a class that is correct in multi threaded use in one application can cause a deadlock in another application.
i also don't understand how you're supposed to use deferreds the twisted doc says deferreds won't *make* your code asynchronous. so let's say you have to do an sql query that takes 10 seconds, deferreds would be useless for making that not block unless you have a way of making that sql query non-blocking already? how is that done? do you run a separate thread of your own for each sql query? one thread for all sql queries?
If there is an asynchronous API for doing a particular type of I/O, use that. If there isn't, you have to use a thread like you describe and use one of the thread safe reactor calls to pass the result. My gut feeling tells me to use a thread pool, possibly of size 1, to access for example a database. But I haven't written code like this, so I have no experience to back this up. Every kind of I/O I wanted to do so far was already handled by Twisted. In the case of databases, use "adbapi".
also I wonder in an typical twisted app, just how slow should an operation be before you use a deferred? what if a user enters a username and password and i have to look that up in the database. do i use a deferred? just how bad should the query be before using a deferred?
It depends on the kind of database. If you have an in-memory database, you don't need a deferred. If you have a simple text file on a local disk, you probably don't need a deferred. If you contact a DB server on the same machine, you might get away with not using a deferred, but it would be better to use one. If you contact a DB server on a different machine, definately use a deferred. One simple check is to imagine what would happen if the DB is not available. If you use an in-memory DB, it will always be available. If you use a simple text file on a local disk, you will immediately get an error if opening it fails. If you contact a DB server, it is possible you get a timeout when connecting to it. Since server timeouts are typically in the order of seconds, this is not something you'd want to block your entire application on, so use a deferred. In any case, Twisted offers "cred" as an authentication framework and cred always uses a deferred to give you the results of a credentials check. This is good because now you can easily switch from one type of credentials checker to another without changing the code that uses it.
(reading the twisted docs is like reading a brick wall for me, it would be nice if someone could just explain things to me in simple terms.)
I think one of the problems is that many people who get started with Twisted are learning both asynchronous programming and Twisted at the same time, so there are a lot of new concepts to learn. Bye, Maarten