A weekend or two ago, I was planning on doing some work on some ideas I had regarding IOCP and the tulip/async-IO discussion. I ended up getting distracted by WSAPoll. WSAPoll is a method that Microsoft introduced with Vista/2008 that is intended to be semantically equivalent to poll() on UNIX. I decided to play around and see what it would take to get it available via select.poll() on Windows, eventually hacking it into a working state. Issue: http://bugs.python.org/issue16507 Patch: http://bugs.python.org/file28038/wsapoll.patch So, it basically works. poll() on Windows, who would have thought. It's almost impossible to test with our current infrastructure; all our unit tests seem to pass pipes and other non-Winsock-backed-socks to poll(), which, like select()-on-Windows, isn't supported. I suspect Twisted's test suite would give it a much better work out (CC'd Glyph just so it's on his radar). I ended up having to verify it worked with some admittedly-hacky dual-python-console sessions, one poll()'ing as a server, the other connecting as a client. It definitely works, so, it's worth keeping it in mind for the future. It's still equivalent to poll()'s O(N) on UNIX, but that's better than the 64/512 limit select is currently stuck with on Windows. Didn't have much luck trying to get the patched Python working with tulip's PollReactor, unfortunately, so I just wanted to provide some feedback on that experience. First bit of feedback: man, debugging `yield from` stuff is *hard*. Tulip just flat out didn't work with the PollReactor from the start but it was dying in a non-obvious way. So, I attached both a Pdb debugger and Visual Studio debugger and tried to step through everything to figure out why the first call to poll() was blowing up (can't remember the exact error message but it was along the lines of "you can't poll() whatever it is you just asked me to poll(), it's defo' not a socket"). I eventually, almost by pure luck, traced the problem to the fact that PollReactor's __init__ eventually results in code being called that calls poll() on two os.pipe() objects (in EventLoop I think). However, when I was looking at the code, it appeared as though the first poll() came from the getaddrinfo(). So all my breakpoints and whatnot were geared towards that, yet none of them were being hit, yet poll() was still being called somehow, somewhere. I ended up having to spend ages traipsing through every line in Visual Studio's debugger to try figure out what the heck was going on. I believe the `yield from` aspect made that so much more of an arduous affair -- one moment I'm in selectmodule.c's getaddrinfo(), then I'm suddenly deep in the bowels of some cryptic eval frame black magic, then one 'step' later, I'm over in some completely different part of selectmodule.c, and so on. I think the reason I found it so tough was because when you're single stepping through each line of a C program, you can sort of always rely on the fact you know what's going to happen when you "step" the next line. In this case though, a step of an eval frame would wildly jump to seemingly unrelated parts of C code. As far as I could tell, there was no easy/obvious way to figure the details out before stepping that instruction either (i.e. probing the various locals and whatnot). So, that's the main feedback from that weekend, I guess. Granted, it's more of a commentary on `yield from` than tulip per se, but I figured it would be worth offering up my experience nevertheless. I ended up with the following patch to avoid the initial poll() against os.pipe() objects: --- a/polling.py Sat Nov 03 13:54:14 2012 -0700 +++ b/polling.py Tue Nov 27 07:05:10 2012 -0500 @@ -41,6 +41,7 @@ import os import select import time +import sys class PollsterBase: @@ -459,6 +460,10 @@ """ def __init__(self, eventloop, executor=None): + if sys.platform == 'win32': + # Work around the fact that we can't poll pipes on Windows. + if isinstance(eventloop.pollster, PollPollster): + eventloop = EventLoop(SelectPollster()) self.eventloop = eventloop self.executor = executor # Will be constructed lazily. self.pipe_read_fd, self.pipe_write_fd = os.pipe() By that stage it was pretty late in the day and I accepted defeat. My patch didn't really work, it just allowed the test to run to completion without the poll OSError exception being raised. Trent.
On Tue, Nov 27, 2012 at 10:33 PM, Trent Nelson <trent@snakebite.org> wrote:
In this case though, a step of an eval frame would wildly jump to seemingly unrelated parts of C code. As far as I could tell, there was no easy/obvious way to figure the details out before stepping that instruction either (i.e. probing the various locals and whatnot).
I'm not sure that has anything to do with "yield from", but rather to do with the use of computed gotos ( http://hg.python.org/cpython/file/default/Python/ceval.c#l821). For sane stepping in the eval loop, you probably want to build with "--without-computed-gotos" enabled (that's a configure option on Linux, I have no idea how to turn it off on Windows). Even without that, the manual opcode prediction macros are still a bit wacky (albeit easier to follow than the compiler level trickery). The eval loop commits many sins against debuggability and maintainability in pursuit of speed, so it's not really fair to place all of that at the feet of the yield from clause. if you really did just mean the behaviour of jumping from caller-frame-eval-loop to generator-frame-eval-loop and back out again, then that again is really just about generator stepping at the C level (where suspend/resume means passing through ceval.c), rather than being specific to yield from. So, that's the main feedback from that weekend, I guess. Granted,
it's more of a commentary on `yield from` than tulip per se, but I figured it would be worth offering up my experience nevertheless.
From your description so far, it seems like more of a commentary on
pointing a C level debugger at our eval loop... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 27.11.2012 15:30, schrieb Nick Coghlan:
I'm not sure that has anything to do with "yield from", but rather to do with the use of computed gotos (http://hg.python.org/cpython/file/default/Python/ceval.c#l821). For sane stepping in the eval loop, you probably want to build with "--without-computed-gotos" enabled (that's a configure option on Linux, I have no idea how to turn it off on Windows). Even without that, the manual opcode prediction macros are still a bit wacky (albeit easier to follow than the compiler level trickery).
I don't think the problem is related to computed gotos. Visual Studio doesn't support labels as values and therefore doesn't do computed gotos, too. It's a special feature of GCC and some other compilers. tl;dr: No computed gotos on Windows ;) Christian
Nevertheless the optimizer does crazy things to ceval.c. Trent, can you confirm you were debugging unoptimized code? --Guido van Rossum (sent from Android phone) On Nov 27, 2012 8:57 AM, "Christian Heimes" <christian@python.org> wrote:
Am 27.11.2012 15:30, schrieb Nick Coghlan:
I'm not sure that has anything to do with "yield from", but rather to do with the use of computed gotos (http://hg.python.org/cpython/file/default/Python/ceval.c#l821). For sane stepping in the eval loop, you probably want to build with "--without-computed-gotos" enabled (that's a configure option on Linux, I have no idea how to turn it off on Windows). Even without that, the manual opcode prediction macros are still a bit wacky (albeit easier to follow than the compiler level trickery).
I don't think the problem is related to computed gotos. Visual Studio doesn't support labels as values and therefore doesn't do computed gotos, too. It's a special feature of GCC and some other compilers.
tl;dr: No computed gotos on Windows ;)
Christian _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
2012/11/27 Guido van Rossum <guido@python.org>
Nevertheless the optimizer does crazy things to ceval.c. Trent, can you confirm you were debugging unoptimized code?
ceval.c is always compiled with a lot of optimizations, even in "debug" mode, because of the "#define PY_LOCAL_AGGRESSIVE" at the top of the file. I sometimes had to remove this line to debug programs correctly. OTOH the stack usage is much higher and some recursion tests will fail. -- Amaury Forgeot d'Arc
On Tue, Nov 27, 2012 at 09:17:57AM -0800, Guido van Rossum wrote:
Nevertheless the optimizer does crazy things to ceval.c. Trent, can you confirm you were debugging unoptimized code?
Yup, definitely. If it helps, I can fire up the dev env again and give specifics on the exact frame-jumping-voodoo that baffled me. Trent.
Hi, Le Tue, 27 Nov 2012 07:33:25 -0500, Trent Nelson <trent@snakebite.org> a écrit :
So, it basically works. poll() on Windows, who would have thought.
It's almost impossible to test with our current infrastructure; all our unit tests seem to pass pipes and other non-Winsock-backed-socks to poll(), which, like select()-on-Windows, isn't supported.
Well, then you should write new tests that don't rely on pipes. There's no reason it can't be done, and there are already lots of examples of tests using TCP sockets in our test suite. It will also be a nice improvement to the current test suite for Unix platforms.
Visual Studio's debugger to try figure out what the heck was going on. I believe the `yield from` aspect made that so much more of an arduous affair -- one moment I'm in selectmodule.c's getaddrinfo(), then I'm suddenly deep in the bowels of some cryptic eval frame black magic, then one 'step' later, I'm over in some completely different part of selectmodule.c, and so on.
I'm not sure why you're using Visual Studio to debug Python code? It sounds like you want something higher-level, e.g. Python print() calls or pdb. Regards Antoine.
On Tue, Nov 27, 2012 at 06:42:04AM -0800, Antoine Pitrou wrote:
Hi,
Le Tue, 27 Nov 2012 07:33:25 -0500, Trent Nelson <trent@snakebite.org> a écrit :
So, it basically works. poll() on Windows, who would have thought.
It's almost impossible to test with our current infrastructure; all our unit tests seem to pass pipes and other non-Winsock-backed-socks to poll(), which, like select()-on-Windows, isn't supported.
Well, then you should write new tests that don't rely on pipes. There's no reason it can't be done, and there are already lots of examples of tests using TCP sockets in our test suite. It will also be a nice improvement to the current test suite for Unix platforms.
Agreed, there's more work required. It's on the list.
Visual Studio's debugger to try figure out what the heck was going on. I believe the `yield from` aspect made that so much more of an arduous affair -- one moment I'm in selectmodule.c's getaddrinfo(), then I'm suddenly deep in the bowels of some cryptic eval frame black magic, then one 'step' later, I'm over in some completely different part of selectmodule.c, and so on.
I'm not sure why you're using Visual Studio to debug Python code? It sounds like you want something higher-level, e.g. Python print() calls or pdb.
Ah, right. So, I was trying to figure out why poll was barfing up an WSAError on whatever it was being asked to poll. So, I set out to find what it was polling, via breakpoints in register(). That pointed to an fd with value 3. That seemed a little strange, as all my other socket tests consistently had socket fd values above 250-something. So, I wanted to track down where that fd was coming from, thinking it was related to the first poll()/register() instance I could find in getaddrinfo(). It wasn't, and through combined use of *both* pdb and VS, I eventually stumbled onto the attempt to poll os.pipe() FDs. I think. (There were also other issues that I skipped over in the e-mail; like figuring out I had to &= ~POLLPRI in order for the poll call to work at all.) And... Visual Studio's debugger is sublime. I'll jump at the chance to fire it up if I think it'll help me debug an issue. You get much better situational awareness than stepping through with gdb. Trent.
I wasn't there, and it's been 7 years since I last saw Visual Studio, but I do believe it is a decent way to debug C code. But it sounds like it wa tough to explore the border between C and Python code, which is why it took you so long to find the issue, right? Also, please be aware that tulip is *not* ready for anything. As I just stated in a thread on python-dev, it is just my way of trying to understand the issues with async I/O (in a different context than where I've understood them before, i.e. App Engine's NDB). I am well aware of how hard it is to debug currently -- just read the last section in the TODO file. I have not had to debug any C code, so my experience is purely based on using pdb. Here, the one big difficulty seems to be that it does the wrong thing when it hits a yield or yield-from -- it treats these as if they were returns, and this totally interrupts the debug flow. In the past, when I was debugging NDB, I've asked in vain whether someone had already made the necessary changes to pdb to let it jump over a yield instead of following it -- I may have to go in and develop a change myself, because this problem isn't going away. However, I have noted that a system using a yield-from-based scheduler is definitely more pleasant to debug than one using yield <future> -- the big improvement is that if the system prints a traceback, it automatically looks right. However there are still situations where there's a suspended task that's holding on to relevant information, and it's too hard to peek in its stack frame. I will be exploring better solutions once I get back to working on tulip more intensely. -- --Guido van Rossum (python.org/~guido)
On Tue, Nov 27, 2012 at 6:06 PM, Guido van Rossum <guido@python.org> wrote:
In the past, when I was debugging NDB, I've asked in vain whether someone had already made the necessary changes to pdb to let it jump over a yield instead of following it -- I may have to go in and develop a change myself, because this problem isn't going away.
Do you want new pdb command or change behavior of *step* or *next*? -- Thanks, Andrew Svetlov
On Wed, Nov 28, 2012 at 1:53 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
On Tue, Nov 27, 2012 at 6:06 PM, Guido van Rossum <guido@python.org> wrote:
In the past, when I was debugging NDB, I've asked in vain whether someone had already made the necessary changes to pdb to let it jump over a yield instead of following it -- I may have to go in and develop a change myself, because this problem isn't going away.
Do you want new pdb command or change behavior of *step* or *next*?
Good question. If it's easier I'd be okay with a new command; but changing the behavior of "next" would also work, if it's possible. Are you interested in working on an implementation? I'd be interested in reviewing it then. -- --Guido van Rossum (python.org/~guido)
Probably will try to make a patch this weekend. Changing behavior of *next* command looks more convenient for end user. On Wed, Nov 28, 2012 at 7:09 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Nov 28, 2012 at 1:53 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
On Tue, Nov 27, 2012 at 6:06 PM, Guido van Rossum <guido@python.org> wrote:
In the past, when I was debugging NDB, I've asked in vain whether someone had already made the necessary changes to pdb to let it jump over a yield instead of following it -- I may have to go in and develop a change myself, because this problem isn't going away.
Do you want new pdb command or change behavior of *step* or *next*?
Good question. If it's easier I'd be okay with a new command; but changing the behavior of "next" would also work, if it's possible. Are you interested in working on an implementation? I'd be interested in reviewing it then.
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
Agreed. Would be wonderful! On Wed, Nov 28, 2012 at 9:39 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Probably will try to make a patch this weekend. Changing behavior of *next* command looks more convenient for end user.
On Wed, Nov 28, 2012 at 7:09 PM, Guido van Rossum <guido@python.org> wrote:
On Wed, Nov 28, 2012 at 1:53 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
On Tue, Nov 27, 2012 at 6:06 PM, Guido van Rossum <guido@python.org> wrote:
In the past, when I was debugging NDB, I've asked in vain whether someone had already made the necessary changes to pdb to let it jump over a yield instead of following it -- I may have to go in and develop a change myself, because this problem isn't going away.
Do you want new pdb command or change behavior of *step* or *next*?
Good question. If it's easier I'd be okay with a new command; but changing the behavior of "next" would also work, if it's possible. Are you interested in working on an implementation? I'd be interested in reviewing it then.
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
That will need to be well highlighted in What's New, as it could be very confusing if the iterator is never called again. -- Sent from my phone, thus the relative brevity :)
Created http://bugs.python.org/issue16596 for jumping over yields. Please review. On Wed, Nov 28, 2012 at 11:41 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That will need to be well highlighted in What's New, as it could be very confusing if the iterator is never called again.
-- Sent from my phone, thus the relative brevity :)
-- Thanks, Andrew Svetlov
participants (7)
-
Amaury Forgeot d'Arc
-
Andrew Svetlov
-
Antoine Pitrou
-
Christian Heimes
-
Guido van Rossum
-
Nick Coghlan
-
Trent Nelson