Thanks for your thoughtful reply, Dima!  I work on embedded systems as a firmware engineer and I've found that all of our products use serial UART and as you say, 99% of the HW is USB dongles, primarily from FTDI or Silabs.  It seems that the development of the drivers was focused on using the Windows/Linux serial abstractions to present a comfortable interface to the user.  I am not interested in second guessing this decision and would rather leave the USB layer alone and instead work with the drivers as provided.

I believe that I can best explain my desire for an asyncio implementation with a few examples.  Then I will work through the asyncio source code to further my understanding of this abstraction (which is perhaps my most favorite in programming!)

The ubiquitous implementation of serial comms in python, pySerial, allows the user to
write() bytes to a serial interface.  Because the buffer setup is rather large, 4096 bytes, it is likely that the write function blocks for the amount of time it takes to copy the transmitted bytes to the outgoing buffer and will return well before the serial device has transmitted the signal on its TX line.  To obtain a kind of synchronization, the library defines a flush() method that waits until the OS TX buffer is empty.  In the Windows implementation this is a polling busy wait at 50ms intervals while in Linux it blocks on tcdrain().  My method of waiting on the Windows OS event using loop._proactor.wait_for_handle(overlapped_write.hEvent) allows for the Windows implementation to discard the busy wait in favor of event signaling.

Since pySerial isn't asyncio-ready, programmers needing some concurrency in python would use threads or asyncio thread pools.  In the case of FW engineers working with embedded systems, we may like to have an async generator "task" that is reading a byte stream from a serial device as well as an awaitable write method that completes when bytes have actually been put on the transport.  It seems to me that Windows/POSIX OS are each handling the completion of read/write events and that therefore python implementations that create threads are inelegant.  For example, to adapt pySerial to be asyncio friendly, there is a new project, aioserial, that I will be contributing to.  So far it uses thread pool to wrap the old pySerial library; it wraps function calls in
loop.run_in_executor() in order to return awaitables.  My hope is that with guidance from the asyncio team I can bring a well-supported async implementation to Python serial IO.

At this point, I admit that I may have lost perspective by working on systems with 32K of RAM where each thread is absolutely precious, powerful, and dangerous!  Perhaps these days it is OK to spawn new threads as needed at runtime to wait on an OS thread that is itself waiting on a HW event.  If the consensus is that a python-thread-based approach is best, then we don't need to look much further than wrapping IO in loop.run_in_executor()!  Nevertheless, I will continue to explore the implementation since I am always interested in energy efficiency and beautiful abstraction.

My working implementation uses the _wait_for_handle() method of the IocpProactor class defined here.  Let's see how/why the proof of concept is working.

wait_for_handle() recieves an overlapped.hEvent created with win32 CreateEvent (note that a total of two would be created, one for reads, one for writes). The event is setup for signaling by using  SetCommMask with flags EV_RXFLAG | EV_TXEMPTY during initialization and then calling  WaitCommEvent with a reference to the overlapped each time new IO begins.  This will cause the overlapped.hEvent that wait_for_handle() receives to be signaled when the OS completes the IO.

_wait_for_handle() calls RegisterWaitWithQueue() which wraps the win32 API RegisterWaitForSingleObject.  The important bit here is that this API allows for registration of a callback function to fire on completion of the event.  This callback will by called with the lpParameter argument containing struct PostCallbackData data = {CompletionPort, Overlapped}, *pdata; (line 355 of overlapped.c).  And so it gets called with the completion port of self._iocp and a unique address, ov.address, which is NOT the overlapped structure we are originally awaiting, according to the note at line 714: # We only create ov so we can use ov.address as a key for the cache. \ ov = _overlapped.Overlapped(NULL).

So we see how a callback is registered by the
IocpProactor event loop, now let's understand then how this causes the "awaitable future" to complete at the python layer.

A "future" is created: 
f = _WaitHandleFuture(ov, handle, wait_handle, self, loop=self._loop).  Importantly this calls the Win32 API CreateEvent - for my purposes this seems redundant at first glance, but I am afraid that it may be necessary due to the simple fact the WaitCommEvent does not take a callback!  I will have to investigate further.  This "future" is an instance of a subclass of _BaseWaitHandleFuture which defines a _poll() method utilizing win32 WaitForSingleObject to poll for signaled state: "If dwMilliseconds is zero, the function does not enter a wait state if the object is not signaled; it always returns immediately."

It's a bit hard to track down, but if I am understanding correctly, the "super loop" of the IocpProactor is its own _poll().  It starts by calling GetQueuedCompletionStatus with an infinite timeout.  This may answer one of my main curiosities: is this how the asyncio loop waits for multiple events from multiple threads without creating waiting threads of its own?  Anyway, it retrieves the "future" from self._cache, the blank overlapped used as the cache key, 0, and the finish_wait_for_handle(trans, key, ov) function created way back in _wait_for_handle().

This callback wraps the default implementation of the _BaseWaitHandleFuture._poll() which wraps WaitForSingleObject, discussed above, and returns True if the event is signaled or false otherwise (I believe false would be an error condition?).  Recall that in my implementation, "event" at this stage refers to a EV_TXEMPTY or EV_RXCHAR event, for example, setup by WaitCommEvent and SetCommMask earlier.  The future's set_result() will be called with True and appended to self._results.  Recall that wait_for_handle() returned this very same future to my application layer earlier, so the call to set_result() will cause the application's wait to end.

Although there may be gaps in my understanding of the asyncio IO Completion Ports proactor implementation, by following the code I am confident that my usage of
IocpProactor.wait_for_handle() does not create threads in the python layer.  Without this implementation, the programmer wishing to manage concurrency with serial IO must resort to 1) creating and managing an extra thread for each IO direction and device or 2) manually wrapping the serial IO using loop.run_in_executor() or 3) using the aioserial library that abstracts 2) for them.

I think that creating, managing, and destroying threads only to wait on a few bytes to arrive over a 10KBps transport is overkill.

Is it possible that there is an approach better than using 
wait_for_handle()?  For example, the loop.add_reader(fd, callback, *args) API seems to satisfy my requirements but is not supported by IocpProactor.  If there is interest, I could look into adding IocpProactor for that API. There is also the Streams abstraction that seems appropriate, but I could not figure out how to hook into it with SetCommMask, WaitCommEvent, and the overlapped structures.  Yet another idea is to take what I have learned from the IocpProactor internals and copy and expose them in simplified form for my own implementation, though I'd still need a nice way to throw them on the loop.

A big thanks for following along and aiding my understanding of the asyncio paradigm!

Cheers,
J.P. Hutchins

P.S.: I am focused on Windows because I am not so worried about the POSIX implementation ;).  Embedded always has Windows running anyway.


On Wed, Aug 31, 2022 at 7:10 PM Dima Tisnek <dimaqq@gmail.com> wrote:
A few thoughts from someone who worked with serial ports, parallel
ports and USB and covered Linux and Windows for some these.

1. Serial ports are dead
2. UNIX and Windows implementations are fundamentally different.
3. Even within UNIX, there's quite a variety

1.
The hardware serial ports still exist on some rare PC motherboards,
but it's quite rate to actually use those.
Instead, there are a lot of other ports where serial port abstraction
can be used, in chronological order:
* USB serial dongles
* USB devices that integrate a microprocessor that's connected via its
serial interface
* USB devices that integrate a microprocessor with USB stack that
fakes a serial port
* Bluetooth devices following the above
* Bluetooth devices with modem (acm, not serial port) interface

Access to both USB and Bluetooth is done differently now, UMDF for
USB, and I think something similar for bluetooth,

2.
UNIX APIs are pretty consistent wrt. file descriptor use, even when
there are major gotchas in the kernel mode (serial vs tty for
example).
Windows APIs are frankly all over the place. Their pipes are not the
same as pipes, etc. Their UNIX-like APIs only work so far.
For a random example, see e.g. https://github.com/microsoft/terminal/issues/262

3.
There's classical UNIX, but then there were tons of improvements:
Linux got epoll, aio, io_submit...
Mac got AsyncBytes and something or other underneath
*BSD for something or other, but a bit differently
Thus, a "good" asyncio loop implementation is likely to use
OS-specific primitives

So, where does it leave you?
If your aim is to contribute to asyncio, may I suggest that you find
another target than serial interfaces.
If your aim is to support some specific device -- follow how that
device is connected to the machine: ioports? iomem? usb? bt? etc.
If your aim is to achieve high-bandwidth or low-latency -- get close to hardware
If your aim is to support, let's say 100 ports at once -- one of the
two approaches above
If I couldn't guess your aim, please explain why `asyncio` in the first place.

Cheers,
Dima Tisnek

On Thu, Sep 1, 2022 at 2:58 AM J.P. Hutchins <jphutchins@gmail.com> wrote:
>
> Greetings!
>
> I would like to modify/replace an existing library, pySerial, to use asyncio in Windows/Mac/Linux.  I have a Windows implementation working by "listening for an event" like this:
>
> read_future = loop._proactor.wait_for_handle(overlapped_read.hEvent)
>
> Where overlapped_read is the OVERLAPPED structure (via ctypes or pywin32) and the event is setup previously, e.g. "received chars on the serial port" event here.
>
> My question is in regards to the best practices for awaiting an OS event providing for the most efficient and maintainable implementation.  Reference to other multi-platform libraries or builtins that accomplish similar would be appreciated.
>
> Thanks for your time,
> J.P. Hutchins
> _______________________________________________
> Async-sig mailing list -- async-sig@python.org
> To unsubscribe send an email to async-sig-leave@python.org
> https://mail.python.org/mailman3/lists/async-sig.python.org/
> Code of Conduct: https://www.python.org/psf/codeofconduct/