[Twisted-Python] Async file I/O with Linux & Twisted
Hi, I've spent last few days trying to write a wrapper for libaio. Libaio is a simple Linux-only library, which should in theory support async read/write on a file descriptor. In practice - this depends on many strange requirements and the documentation is basically nonexistent (if you don't count lwn articles and kernel mailing list archives), so that last few days of C coding looked like this: (it's funny how every C coding session looks similar) http://youtube.com/watch?v=-JbhMvzeX7s Just to summarize... * no AIO reads from socket * not every filesystem supported * output buffer must be N-pages long and aligned to page start * file access is unbuffered (yep, no cache!) * ... and it may still - *silently* - block in some circumstances (like, when you run out of "block layer requests", like if I knew, what are those) Anyways, libaio seems to be a cool idea - maybe someday kernel guys will do some more work on it (like, commit those patches which add buffer aio) and it will be possible to do AIO stuff on sockets. On the other hand, I found no way other to be informed about data availability, than periodically reaping events (want libaio and epoll integration? use some more kernel patches...) You can get the module code and read about current linux aio problems here: http://code.google.com/p/twisted-linux-aio/ It is not a general-purpose module - it's tightly tied to Twisted. Only reads are supported ATM and there's no Python queue, only that low level one (see TODO.txt). If you're lucky, you may even see the proof of concept code running - w/o blocking. There are more things to come, especially that async shutil.copyfileobj replacement... :) Sad thing is, that Linux seems to lack much in this area, especially when compared to FreeBSD... -- m PS: I feel glad I didn't started with POSIX aio_* implementation - on Linux it is said to launch a new thread for every fd request! http://www.atnf.csiro.au/people/rgooch/linux/docs/io-events.html - way to go, glibc programmers! I hope things are looking better now, than they were that few years ago...
Hi there, May I ask why you chose libaio over, for example, libevent? Arnar On Dec 20, 2007 2:42 AM, Michał Pasternak <michal.dtz@gmail.com> wrote:
Hi,
I've spent last few days trying to write a wrapper for libaio.
Libaio is a simple Linux-only library, which should in theory support async read/write on a file descriptor. In practice - this depends on many strange requirements and the documentation is basically nonexistent (if you don't count lwn articles and kernel mailing list archives), so that last few days of C coding looked like this: (it's funny how every C coding session looks similar)
http://youtube.com/watch?v=-JbhMvzeX7s
Just to summarize... * no AIO reads from socket * not every filesystem supported * output buffer must be N-pages long and aligned to page start * file access is unbuffered (yep, no cache!) * ... and it may still - *silently* - block in some circumstances (like, when you run out of "block layer requests", like if I knew, what are those)
Anyways, libaio seems to be a cool idea - maybe someday kernel guys will do some more work on it (like, commit those patches which add buffer aio) and it will be possible to do AIO stuff on sockets. On the other hand, I found no way other to be informed about data availability, than periodically reaping events (want libaio and epoll integration? use some more kernel patches...)
You can get the module code and read about current linux aio problems here:
http://code.google.com/p/twisted-linux-aio/
It is not a general-purpose module - it's tightly tied to Twisted. Only reads are supported ATM and there's no Python queue, only that low level one (see TODO.txt). If you're lucky, you may even see the proof of concept code running - w/o blocking. There are more things to come, especially that async shutil.copyfileobj replacement... :)
Sad thing is, that Linux seems to lack much in this area, especially when compared to FreeBSD...
-- m
PS: I feel glad I didn't started with POSIX aio_* implementation - on Linux it is said to launch a new thread for every fd request! http://www.atnf.csiro.au/people/rgooch/linux/docs/io-events.html - way to go, glibc programmers! I hope things are looking better now, than they were that few years ago...
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Thu, 20 Dec 2007 09:23:43 +0000 "Arnar Birgisson" <arnarbi@gmail.com> wrote:
Hi there,
May I ask why you chose libaio over, for example, libevent?
libevent is to C as Twisted is to Python. Both run on many OS/platforms, but neither give you asynchronous file read/write; read(2) on a file descriptor (not socket) will be always blocking. At least this is what I understood after quick glance at libevent source - all it does in buffer.c:evbuffer_read on non-win32 OS is calling read(2). libaio, instead, gives you async filesystem read/write access. Reading a few hundred megabytes of data won't block Twisted reactor, which is good. Even better news is that since sending that e-mail yesterday I've found out how to integrate Linux async I/O with epoll notification, so the implementation I'm working on now will be even better. Pity it's Linux-only, but well... to be portable, it should be POSIX aio really, but I can't give you any guarantees about POSIX aio performance on Linux. I can try working on other OSs later, if there will be enough ppl interested. Of course it would be cool to have Python libevent bindings - and even cooler it would be to have a portable libevent reactor in C - but would that really remove some of the limitations that Twisted has still to deal with? You'd still have to hack portable AIO implementation into libevent to have async file read/writes then ... :) Take care, -- m
On Dec 20, 2007 6:16 PM, Michał Pasternak <michal.dtz@gmail.com> wrote:
On Thu, 20 Dec 2007 09:23:43 +0000 "Arnar Birgisson" <arnarbi@gmail.com> wrote:
Hi there,
May I ask why you chose libaio over, for example, libevent?
libevent is to C as Twisted is to Python. Both run on many OS/platforms, but neither give you asynchronous file read/write; read(2) on a file descriptor (not socket) will be always blocking. At least this is what I understood after quick glance at libevent source - all it does in buffer.c:evbuffer_read on non-win32 OS is calling read(2).
I'm pretty sure you are mistaken. As I understood, libevent provides a portable API on top of whatever async I/O mechanisms there are on the underlying system, selecting the best available implementation. For Linux, I believe this it currently uses epoll, kqueue on freebsd, or select() or poll() on older systems.
libaio, instead, gives you async filesystem read/write access. Reading a few hundred megabytes of data won't block Twisted reactor, which is good.
libevent provides file and socket i/o, along with timers. Latest version provides buffered i/o and async dns resolver and http server. Please forgive (and ignore) me if I'm completely missing something :)
Of course it would be cool to have Python libevent bindings - and even cooler it would be to have a portable libevent reactor in C - but would that really remove some of the limitations that Twisted has still to deal with? You'd still have to hack portable AIO implementation into libevent to have async file read/writes then ... :)
Python libevent bindings have existed for a while in pyevent, a pyrex binding module: http://code.google.com/p/pyevent/ Not sure how well it is maintained, but I've seen many references to it so it looks like people are using it. As for a libevent based reactor, there is an old ticket and branch for it here: http://twistedmatrix.com/trac/ticket/1930 http://twistedmatrix.com/trac/browser/branches/libevent-1930-3 Googling for "twisted libevent reactor" gives many interesting results. cheers, Arnar
I'm pretty sure you are mistaken. As I understood, libevent provides a portable API on top of whatever async I/O mechanisms there are on the underlying system, selecting the best available implementation.
For Linux, I believe this it currently uses epoll, kqueue on freebsd, or select() or poll() on older systems.
None of those are asynchronous IO. They allow efficient querying of readability/writeability, typically used with *non-blocking* IO. In design pattern language they are "reactors", whereas async IO would be a "proactor." In API land you can tell the difference because with async APIs you have a callback: Twisted's Protocol/Transport APIs usually convert low-level change events (socket is readable) and non-blocking IO (reading the socket) to high-level async callbacks (dataReceived called with data).
On Dec 20, 2007 8:36 PM, Itamar Shtull-Trauring <itamar@itamarst.org> wrote:
I'm pretty sure you are mistaken. As I understood, libevent provides a portable API on top of whatever async I/O mechanisms there are on the underlying system, selecting the best available implementation.
For Linux, I believe this it currently uses epoll, kqueue on freebsd, or select() or poll() on older systems.
None of those are asynchronous IO. They allow efficient querying of readability/writeability, typically used with *non-blocking* IO.
Ah, right, I understand. Sorry about the noise then <:| Seems I've been using the wrong word for a while then. So is performance the only reason for preferring asynchronous i/o to non-blocking i/o?
In design pattern language they are "reactors", whereas async IO would be a "proactor." In API land you can tell the difference because with async APIs you have a callback: Twisted's Protocol/Transport APIs usually convert low-level change events (socket is readable) and non-blocking IO (reading the socket) to high-level async callbacks (dataReceived called with data).
Kernel level AIO, is that about the kernel providing that abstraction or is it using the hardware in some entirely different way? Hope you don't mind naive questions. cheers, Arnar
On Thu, 20 Dec 2007 21:13:26 +0000 "Arnar Birgisson" <arnarbi@gmail.com> wrote:
Kernel level AIO, is that about the kernel providing that abstraction or is it using the hardware in some entirely different way?
Linux KAIO (Kernel Async I/O) layer allows you to read and write files in an async (non-blocking) way, just like you read or write non-blocking network sockets. It is similar to async I/O defined by Open Group: http://www.opengroup.org/onlinepubs/009695399/functions/aio_read.html It is a quite new topic, because until now, you were able to perform async operations only on network sockets. Async I/O in terms of network sockets is nothing new. FreeBSD's kqueue, Linux's epoll are just a means of notification, that a given async socket has some data to read (or accepts data to be written), so you don't need to poll for those events (like traditional Unix's poll, which is slow -- see http://www.kegel.com/c10k.html ). [UPDATE] twisted-linux-aio now works with epoll. I finally found out how to create fd notification for aio, so right now twisted-linux-aio supports it. It integrates well with epollreactor. I also added some code to implement Python queue in it, so the usage ATM looks like this: aio.Queue().readfile("/some/file/name").addCallbacks(...) It is still work in progress, though. Too early to say about efficiency, but it seems, that Twisted applications which do a lot of concurrent disk I/O will benefit. Get the new code via http://twisted-linux-aio.googlecode.com/ . -- Michal Pasternak
participants (3)
-
Arnar Birgisson
-
Itamar Shtull-Trauring
-
Michał Pasternak