Hi, sorry for delayed response. I don't think I'm not an expert in Windows API, someone else would be better to review your approach. Not getting too deep into Windows anything was a conscious choice on my part back in the day. In the past, I have used the 2-threads approach, however, the threads were long-lived -- each port would have a reader and writer thread for as long as port was open. There are still some gotchas, 2 come to mind: * Linux: there's tts/tty processing happening for serial port data, typical is to call `stty -echo -echoe -echoke -F ...` or the equivalent ioctl. this seems minor, but the mere fact that echo settings are possible implies a latency hit. I think it was one kernel tick (?) so 10ms back in the day, and maybe 1ms now. I worked around that by detaching the serial ports from the kernel, and using `inb/outb` directly. I get shivers recalling that now. * Windows, and IIRC Linux too. The USB-serial dongles are almost like serial ports, except when they are not. The trivial issues are smaller selection of supported stop bits; The bigger issue is that USB will reconnect every now and again. What I've observed was that if the port is open in the user space, the reconnected device will not "rejoin" that port. Most typical is that a new port is allocated by the OS, and currently open port remains stale. I've had to add some code to forcibly close the port in user space. Worse failures (bad drivers?) were being unable to close the port (typical if port if file is open in blocking mode, but also sometimes in non-blocking mode) or new port not even showing up in the system at all, or port reconnecting with the baud rate reset. There was a time when ftdi offered their client library that talked to their devices over USB directly. I'm sure there was some point to that. Anyway, good luck! On Mon, Sep 5, 2022 at 2:24 PM J.P. Hutchins <jphutchins@gmail.com> wrote:
Thanks for your thoughtful reply, Dima! I work on embedded systems as a firmware engineer and I've found that all of our products use serial UART and as you say, 99% of the HW is USB dongles, primarily from FTDI or Silabs. It seems that the development of the drivers was focused on using the Windows/Linux serial abstractions to present a comfortable interface to the user. I am not interested in second guessing this decision and would rather leave the USB layer alone and instead work with the drivers as provided.
I believe that I can best explain my desire for an asyncio implementation with a few examples. Then I will work through the asyncio source code to further my understanding of this abstraction (which is perhaps my most favorite in programming!)
The ubiquitous implementation of serial comms in python, pySerial, allows the user to write() bytes to a serial interface. Because the buffer setup is rather large, 4096 bytes, it is likely that the write function blocks for the amount of time it takes to copy the transmitted bytes to the outgoing buffer and will return well before the serial device has transmitted the signal on its TX line. To obtain a kind of synchronization, the library defines a flush() method that waits until the OS TX buffer is empty. In the Windows implementation this is a polling busy wait at 50ms intervals while in Linux it blocks on tcdrain(). My method of waiting on the Windows OS event using loop._proactor.wait_for_handle(overlapped_write.hEvent) allows for the Windows implementation to discard the busy wait in favor of event signaling.
Since pySerial isn't asyncio-ready, programmers needing some concurrency in python would use threads or asyncio thread pools. In the case of FW engineers working with embedded systems, we may like to have an async generator "task" that is reading a byte stream from a serial device as well as an awaitable write method that completes when bytes have actually been put on the transport. It seems to me that Windows/POSIX OS are each handling the completion of read/write events and that therefore python implementations that create threads are inelegant. For example, to adapt pySerial to be asyncio friendly, there is a new project, aioserial, that I will be contributing to. So far it uses thread pool to wrap the old pySerial library; it wraps function calls in loop.run_in_executor() in order to return awaitables. My hope is that with guidance from the asyncio team I can bring a well-supported async implementation to Python serial IO.
At this point, I admit that I may have lost perspective by working on systems with 32K of RAM where each thread is absolutely precious, powerful, and dangerous! Perhaps these days it is OK to spawn new threads as needed at runtime to wait on an OS thread that is itself waiting on a HW event. If the consensus is that a python-thread-based approach is best, then we don't need to look much further than wrapping IO in loop.run_in_executor()! Nevertheless, I will continue to explore the implementation since I am always interested in energy efficiency and beautiful abstraction.
My working implementation uses the _wait_for_handle() method of the IocpProactor class defined here. Let's see how/why the proof of concept is working.
wait_for_handle() recieves an overlapped.hEvent created with win32 CreateEvent (note that a total of two would be created, one for reads, one for writes). The event is setup for signaling by using SetCommMask with flags EV_RXFLAG | EV_TXEMPTY during initialization and then calling WaitCommEvent with a reference to the overlapped each time new IO begins. This will cause the overlapped.hEvent that wait_for_handle() receives to be signaled when the OS completes the IO.
_wait_for_handle() calls RegisterWaitWithQueue() which wraps the win32 API RegisterWaitForSingleObject. The important bit here is that this API allows for registration of a callback function to fire on completion of the event. This callback will by called with the lpParameter argument containing struct PostCallbackData data = {CompletionPort, Overlapped}, *pdata; (line 355 of overlapped.c). And so it gets called with the completion port of self._iocp and a unique address, ov.address, which is NOT the overlapped structure we are originally awaiting, according to the note at line 714: # We only create ov so we can use ov.address as a key for the cache. \ ov = _overlapped.Overlapped(NULL).
So we see how a callback is registered by the IocpProactor event loop, now let's understand then how this causes the "awaitable future" to complete at the python layer.
A "future" is created: f = _WaitHandleFuture(ov, handle, wait_handle, self, loop=self._loop). Importantly this calls the Win32 API CreateEvent - for my purposes this seems redundant at first glance, but I am afraid that it may be necessary due to the simple fact the WaitCommEvent does not take a callback! I will have to investigate further. This "future" is an instance of a subclass of _BaseWaitHandleFuture which defines a _poll() method utilizing win32 WaitForSingleObject to poll for signaled state: "If dwMilliseconds is zero, the function does not enter a wait state if the object is not signaled; it always returns immediately."
It's a bit hard to track down, but if I am understanding correctly, the "super loop" of the IocpProactor is its own _poll(). It starts by calling GetQueuedCompletionStatus with an infinite timeout. This may answer one of my main curiosities: is this how the asyncio loop waits for multiple events from multiple threads without creating waiting threads of its own? Anyway, it retrieves the "future" from self._cache, the blank overlapped used as the cache key, 0, and the finish_wait_for_handle(trans, key, ov) function created way back in _wait_for_handle().
This callback wraps the default implementation of the _BaseWaitHandleFuture._poll() which wraps WaitForSingleObject, discussed above, and returns True if the event is signaled or false otherwise (I believe false would be an error condition?). Recall that in my implementation, "event" at this stage refers to a EV_TXEMPTY or EV_RXCHAR event, for example, setup by WaitCommEvent and SetCommMask earlier. The future's set_result() will be called with True and appended to self._results. Recall that wait_for_handle() returned this very same future to my application layer earlier, so the call to set_result() will cause the application's wait to end.
Although there may be gaps in my understanding of the asyncio IO Completion Ports proactor implementation, by following the code I am confident that my usage of IocpProactor.wait_for_handle() does not create threads in the python layer. Without this implementation, the programmer wishing to manage concurrency with serial IO must resort to 1) creating and managing an extra thread for each IO direction and device or 2) manually wrapping the serial IO using loop.run_in_executor() or 3) using the aioserial library that abstracts 2) for them.
I think that creating, managing, and destroying threads only to wait on a few bytes to arrive over a 10KBps transport is overkill.
Is it possible that there is an approach better than using wait_for_handle()? For example, the loop.add_reader(fd, callback, *args) API seems to satisfy my requirements but is not supported by IocpProactor. If there is interest, I could look into adding IocpProactor for that API. There is also the Streams abstraction that seems appropriate, but I could not figure out how to hook into it with SetCommMask, WaitCommEvent, and the overlapped structures. Yet another idea is to take what I have learned from the IocpProactor internals and copy and expose them in simplified form for my own implementation, though I'd still need a nice way to throw them on the loop.
A big thanks for following along and aiding my understanding of the asyncio paradigm!
Cheers, J.P. Hutchins
P.S.: I am focused on Windows because I am not so worried about the POSIX implementation ;). Embedded always has Windows running anyway.
On Wed, Aug 31, 2022 at 7:10 PM Dima Tisnek <dimaqq@gmail.com> wrote:
A few thoughts from someone who worked with serial ports, parallel ports and USB and covered Linux and Windows for some these.
1. Serial ports are dead 2. UNIX and Windows implementations are fundamentally different. 3. Even within UNIX, there's quite a variety
1. The hardware serial ports still exist on some rare PC motherboards, but it's quite rate to actually use those. Instead, there are a lot of other ports where serial port abstraction can be used, in chronological order: * USB serial dongles * USB devices that integrate a microprocessor that's connected via its serial interface * USB devices that integrate a microprocessor with USB stack that fakes a serial port * Bluetooth devices following the above * Bluetooth devices with modem (acm, not serial port) interface
Access to both USB and Bluetooth is done differently now, UMDF for USB, and I think something similar for bluetooth,
2. UNIX APIs are pretty consistent wrt. file descriptor use, even when there are major gotchas in the kernel mode (serial vs tty for example). Windows APIs are frankly all over the place. Their pipes are not the same as pipes, etc. Their UNIX-like APIs only work so far. For a random example, see e.g. https://github.com/microsoft/terminal/issues/262
3. There's classical UNIX, but then there were tons of improvements: Linux got epoll, aio, io_submit... Mac got AsyncBytes and something or other underneath *BSD for something or other, but a bit differently Thus, a "good" asyncio loop implementation is likely to use OS-specific primitives
So, where does it leave you? If your aim is to contribute to asyncio, may I suggest that you find another target than serial interfaces. If your aim is to support some specific device -- follow how that device is connected to the machine: ioports? iomem? usb? bt? etc. If your aim is to achieve high-bandwidth or low-latency -- get close to hardware If your aim is to support, let's say 100 ports at once -- one of the two approaches above If I couldn't guess your aim, please explain why `asyncio` in the first place.
Cheers, Dima Tisnek
On Thu, Sep 1, 2022 at 2:58 AM J.P. Hutchins <jphutchins@gmail.com> wrote:
Greetings!
I would like to modify/replace an existing library, pySerial, to use asyncio in Windows/Mac/Linux. I have a Windows implementation working by "listening for an event" like this:
read_future = loop._proactor.wait_for_handle(overlapped_read.hEvent)
Where overlapped_read is the OVERLAPPED structure (via ctypes or pywin32) and the event is setup previously, e.g. "received chars on the serial port" event here.
My question is in regards to the best practices for awaiting an OS event providing for the most efficient and maintainable implementation. Reference to other multi-platform libraries or builtins that accomplish similar would be appreciated.
Thanks for your time, J.P. Hutchins _______________________________________________ Async-sig mailing list -- async-sig@python.org To unsubscribe send an email to async-sig-leave@python.org https://mail.python.org/mailman3/lists/async-sig.python.org/ Code of Conduct: https://www.python.org/psf/codeofconduct/