
2014-03-28 2:16 GMT+01:00 Josiah Carlson <josiah.carlson@gmail.com>:
def do_login(...): proc = subprocess.Popen(...) current = proc.recv(timeout=5) last_line = current.rstrip().rpartition('\n')[-1] if last_line.endswith('login:'): proc.send(username) if proc.readline(timeout=5).rstrip().endswith('password:'): proc.send(password) if 'welcome' in proc.recv(timeout=5).lower(): return proc proc.kill()
I don't understand this example. How is it "asynchronous"? It looks like blocking calls. In my definition, asynchronous means that you can call this function twice on two processes, and they will run in parallel. Using greenlet/eventlet, you can write code which looks blocking, but runs asynchronously. But I don't think that you are using greenlet or eventlet here. I take a look at the implementation: http://code.google.com/p/subprocdev/source/browse/subprocess.py It doesn't look portable. On Windows, WriteFile() is used. This function is blocking, or I missed something huge :-) It's much better if a PEP is portable. Adding time.monotonic() only to Linux would make the PEP 418 much shorter (4 sentences instead of 10 pages? :-))! The implementation doesn't look reliable: def get_conn_maxsize(self, which, maxsize): # Not 100% certain if I get how this works yet. if maxsize is None: maxsize = 1024 ... This constant 1024 looks arbitrary. On UNIX, a write into a pipe may block with less bytes (512 bytes). asyncio has a completly different design. On Windows, it uses overlapped operations with IOCP event loop. Such operation can be cancelled. Windows cares of the buffering. On UNIX, non-blocking mode is used with select() (or something faster like epoll) and asyncio retries to write more data when the pipe (or any file descriptor used for process stdin/stdoud/stderr) becomes ready (for reading/writing). asyncio design is more reliable and portable. I don't see how you can implement asynchronous communication with a subprocess without the complex machinery of an event loop.
The API above can be very awkward (as shown :P ), but that's okay. From those building blocks a (minimally) enterprising user would add functionality to suit their needs. The existing subprocess module only offers two methods for *any* amount of communication over pipes with the subprocess: check_output() and communicate(), only the latter of which supports sending data (once, limited by system-level pipe buffer lengths).
As I wrote, it's complex to handle non-blocking file descriptors. You have to catch EWOULDBLOCK and retries later when the file descriptor becomes ready. The main thread has to watch for such event on the file descriptor, or you need a dedicated thread. By the way, subprocess.communicate() is currently implemented using threads on Windows.
Neither allow for nontrivial interactions from a single subprocess.Popen() invocation. The purpose was to be able to communicate in a bidirectional manner with a subprocess without blocking, or practically speaking, blocking with a timeout. That's where the "async" term comes from.
I call this "non-blocking functions", not "async functions". It's quite simple to check if a read will block on not on UNIX. It's more complex to implement it on Windows. And even more complex to handle to add a buffer to write().
Your next questions will be: But why bother at all? Why not just build the piece you need *inside* asyncio? Why does this need anything more? The answer to those questions are wants and needs. If I'm a user that needs interactive subprocess handling, I want to be able to do something like the code snippet above. The last thing I need is to have to rewrite the way my application/script/whatever handles *everything* just because a new asynchronous IO library has been included in the Python standard library - it's a bit like selling you a $300 bicycle when you need a $20 wheel for your scooter.
You don't have to rewrite your whole application. If you only want to use asyncio event loop in a single function, you can use loop.run_until_complete(do_login) which blocks until the function completes. The "function" is an asynchronous coroutine in fact. Full example of asynchronous communication with a subprocess (the python interactive interpreter) using asyncio high-level API: --- import asyncio.subprocess import time import sys @asyncio.coroutine def eval_python_async(command, encoding='ascii', loop=None): proc = yield from asyncio.subprocess.create_subprocess_exec( sys.executable, "-u", "-i", stdin=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.STDOUT, loop=loop) # wait for the prompt buffer = bytearray() while True: data = yield from proc.stdout.read(100) buffer.extend(data) if buffer.endswith(b'>>> '): break proc.stdin.write(command.encode(encoding) + b"\n") yield from proc.stdin.drain() proc.stdin.close() output = yield from proc.stdout.read() output = output.decode(encoding) output = output.rstrip() if output.endswith('>>>'): output = output[:-3].rstrip() return output def eval_python(command, timeout=None): loop = asyncio.get_event_loop() task = asyncio.Task(eval_python_async(command, loop=loop), loop=loop) return loop.run_until_complete(asyncio.wait_for(task, timeout)) def test_sequential(nproc, command): t0 = time.monotonic() for index in range(nproc): eval_python(command) return time.monotonic() - t0 def test_parallel(nproc, command): loop = asyncio.get_event_loop() tasks = [asyncio.Task(eval_python_async(command, loop=loop), loop=loop) for index in range(nproc)] t0 = time.monotonic() loop.run_until_complete(asyncio.wait(tasks)) return time.monotonic() - t0 print("1+1 = %r" % eval_python("1+1", timeout=1.0)) slow_code = "import math; print(str(math.factorial(20000)).count('7'))" dt = test_sequential(10, slow_code) print("Run 10 tasks in sequence: %.1f sec" % dt) dt2 = test_parallel(10, slow_code) print("Run 10 tasks in parallel: %.1f sec (speed=%.1f)" % (dt2, dt/dt2)) # cleanup asyncio asyncio.get_event_loop().close() --- Output: --- 1+1 = '2' Run 10 tasks in sequence: 2.8 sec Run 10 tasks in parallel: 0.6 sec (speed=4.6) --- (My CPU has 8 cores, the speed may be lower on other computers with fewer cores.) Even if eval_python_async() is asynchronous, eval_python() function is blocking so you can write: print("1+1 = %r" % eval_python("1+1")) without callback nor "yield from". Running tasks in parallel is faster than running them in sequence (almost 5 times faster on my PC). The syntax in eval_python_async() is close to the API you proposed, except that you have to add "yield from" in front of "blocking" functions like read() or drain() (it's the function to flush the stdin buffer, I'm not sure that it is needed in this example). The timeout is on the whole eval_python_async(), but you can as well using finer timeout on each read/write.
But here's the thing: I can build enough using asyncio in 30-40 lines of Python to offer something like the above API. The problem is that it really has no natural home.
I agree that writing explicit asynchronous code is more complex than using eventlet. Asynchronous programming is hard.
But in the docs? It would show an atypical, but not wholly unreasonable use of asyncio (the existing example already shows what I would consider to be an atypical use of asyncio).
The asyncio documentation is still a work-in-progress. I tried to document all APIs, but there are too few examples and the documentation is still focused on the API instead of being oriented to the user of the API. Don't hesitate to contribute to the documentation! We can probably write a simple example showing how to interact with an interactive program like Python. Victor