[Chicago] Status of wsgi

Garrett Smith g at rre.tt
Fri Oct 12 01:08:40 CEST 2012


I've always loved the CherryPy WSGI server. It's one of those "good
reads" -- very small and well written. As a threaded WSGI server, it's
quite capable.

Someone in the web2py community (possibly Massimo, I don't recall who)
wrote a threaded WSGI server that is also very lean and performant.

I think both those servers demonstrate that, even within a big boned**
runtime like CPython, you can get great web app performance using
threads and good code!

** I like this term, I'm going to start using it for tons of stuff

On Thu, Oct 11, 2012 at 4:28 PM, Tal Liron <tal.liron at threecrickets.com> wrote:
> If you use either NginX or Apache's WSGi modules, you get the following
> crucial features: management of the CPython processes, support for threading
> (provides limited concurrency benefits but does reduce RAM use), and
> multiplexing of the port. If you want to do anything similar with Tornado,
> you're on your own: you have to run several Tornado processes yourself,
> manage them somehow (what do you do when they get stuck? how do you find
> out?), each on its port, and introduce a load balancer in front for all
> ports. This doesn't seem very scalable or easy to maintain to me: just
> adding a "worker" process involves allocating a port, reconfiguring your
> load balancer (and deploying it), etc. In terms of programming, sure,
> Tornado is easy, but in terms of operations I think it's a nightmare. One
> man's bloat is another man's necessary feature.
>
> I didn't mean to sound so negative about Tornado! I appreciate its approach
> a lot. I just think it's overused in the web world, often chosen for the
> wrong reasons. If I have any criticism about Tornado is that it has
> virtually no support for threading -- which is really why the codebase is so
> small, coherent and easy to debug. That ends up being a great fit for the
> Python world, which generally abhors threads (for the right reasons, in
> context).
>
> The real comparison, in my view, is not Tornado vs. Django (Django over what
> server?), but Tornado vs. other lightweight async servers, such as Node.js.
>
> If I were to write an async Internet application right now from scratch, and
> was not limited to the Python world, I would definitely look towards
> something like Erlang. You get the advantages of threading while still
> maintaining code coherence and debuggability.
>
>
>
> On 10/11/2012 04:08 PM, Japhy Bartlett wrote:
>
> As far as tornado being.. "quite bad ... for REST"...  I guess I'll just say
> that I've been paid to write REST services using both tornado and django,
> and the tornado systems were not only easier to write, maintain and scale.
>
> It also happens to win a lot of benchmarks, and "over-simplification" is
> another man's lack of bloat.  The underlying code is quite nice, and a human
> being can read it.
>
> No.. it is not meant to serve static files, but it's certainly capable
> ("absolutely miserably"?), and that is a *really* weird criticism to make of
> a python web server.
>
> "fast" in the context of tornado actually means.. fast.  Like requests,
> natively asynchronous *or* synchronous through WSGI, tend to get served in
> fewer milliseconds than most other python frameworks.
>
>
> I think it's very underrated, and I hate to see people saying bad things
> about it.  Maybe it's worth a talk in the next month or two?
>
>
>
> On Thu, Oct 11, 2012 at 3:00 PM, Tal Liron <tal.liron at threecrickets.com>
> wrote:
>>
>> On 10/11/2012 01:59 PM, Jordan Bettis wrote:
>>
>> Of course you can have a dynamic worker pool. That's the way apache
>> works. Given that python has a fairly "big boned" runtime, there's a
>> substantial cost there, as well as doing other things like making DB
>> connections for the new workers. And anyway it still only partially
>> solves the problem. You're still going to run out of memory or file
>> descriptors or something eventually. Compare Apache's behavior in the
>> face of a Slow DOS attack compared to that of an asynchronous server
>> like nginx.
>>
>> My life mission is to dispel this myth (especially because I used to
>> believe it myself).
>>
>> Let's get rid of one myth first: a long time ago, it was the case that
>> Linux's single-threaded epoll service was somehow more scalable than more
>> simply using threads to read the socket, because thread switching was
>> painful. This stopped being true a long time ago: the "fastest" web servers
>> (lighttpd) do not use epoll. And, in any case, whether you have a single
>> thread accepting the connections or not, you'll want a pool of thread (or
>> "workers" of some kind) generating content for these connections. I'm saying
>> this to point out that there's some confusion as in what counts as "async":
>> so let's just get the idea that it has to do with accepting the connections
>> out of the way.
>>
>> In a true "async" server, the server calls you to tells you, look, there's
>> this new client connection here (the server maintains a pool of information
>> -- not threads -- about each client). You can then call the server at your
>> convenience when you have data to send to the client, or ask it to close the
>> connection. (Again, let's forget how the server actually implements this
>> internally; it has nothing to do with asynchronicity in the sense we are
>> talking about here.) The quality of an async server has a lot to do with
>> what kind of information it keeps.
>>
>> Think of it for not only in terms of the server but also in terms of your
>> application. At some point the server turns things over to your code. So,
>> what is your application doing?
>>
>> For a typical "web" application (REST), each client connection returns an
>> entity of some kind. So, you really need to process each client quickly in
>> turn. While it's true that NginX or Tornado or Node.js can accept a great
>> many connections (it's just a small record of information they keep for
>> each, not a thread), if there's no thread (or "worker") ready at your
>> application's end to generate an entity, then these connections will queue
>> up and your clients will consider your site "down." Async or sync server
>> makes no difference: your app is sync because it needs to handle one request
>> at a time.
>>
>> So, when does the asynchronous approach make a real difference? Say your
>> application is not typical REST, but instead you are streaming video.
>> There's no single entity that the clients are waiting for. So, what you can
>> do instead is have each of your threads divide their time between the open
>> connections. The more load you have, the less data you want to send per
>> client when their turn comes (or you can give paying clients more time per
>> turn...) A good async server will provide you with statistics about load to
>> help you do the right thing and degrade gracefully. The API approach Garret
>> mentioned for WSGi is typical: your app can just return a null or otherwise
>> tell the server: "Don't return anything to the client right now; in fact,
>> don't you worry about it all, I'll handle the data my own way and close the
>> connection." Yes, such an approach enables async, but I wouldn't call it a
>> good approach. The architectural burden becomes yours. If you're working
>> with Tornado, for example, you're much better off working with its native
>> API than using WSGi. Your app won't be portable, but then async rarely is.
>>
>> There's also a kinda middle ground between these extremes: serving files.
>> Text files are usually too small to make a difference, but what if you are
>> serving a lot of images? They're big, and sending them to slow clients can
>> hold things up if you are using the typical "web" approach of sending them
>> everything then need immediately. So, instead you can kinda stream the file
>> to them, chunk by chunk, and if you do this well you can degrade gracefully.
>> It's async, but with more determinability (because you know the size of the
>> files), so it's a use case that has been heavily optimized. For example,
>> individual chunks can be cached (mmap files ftw). But this has nothing to do
>> with whether the server presents your app with a sync or async interface. As
>> I stated, some of the best file servers are synchronous servers. They
>> provide only a traditional REST API for your apps, but internally they do
>> semi-streaming for files very well.
>>
>> (And actually there's another myth here: that somehow file servers that
>> degrade more gracefully will help you scale. Well, do you ever really want
>> any individual server of yours to get to the point where it starts to
>> degrade performance in any way, let alone degrade gracefully? These days,
>> Google and other search indexes will penalize you for degradation. The trick
>> is to scale horizontally with cheap VMs, so you never hit that point in the
>> graph where things start heading south. You don't care if you're heading
>> south fast or slowly. So, at the high scale it makes almost no difference if
>> you choose Apache or NginX or lighttpd for your REST apps. It will matter
>> only if you're limited to one or two servers in your cluster.)
>>
>> As an opposite example, let's consider Tornado. Yes, it can serve files,
>> but it does so absolutely miserably. Its devs make it clear that it was
>> never their priority to compete with mature web file servers. Instead, the
>> goal was to create a good, straightforward and (to be honest) overly simple
>> async server. Tornado is great if you want to write a streaming server
>> without bells and whistles. But it's quite bad, partly due to its
>> over-simplification, for traditional REST. If you're picking Tornado for
>> your web application because it's "async" and "fast" you might not be
>> understanding what these terms mean in this context. Find a mature sync
>> server and make sure your app, on your end, never holds up a thread for too
>> long.
>>
>> Over and out.
>>
>> -Tl
>>
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>>
>
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>


More information about the Chicago mailing list