Remember that py stone is a terrible benchmark.
I understand that. I was only using it as a spot check. I was surprised at how much slower my (threaded or unthreaded) matrix multiply was on nogil vs 3.9+. I went into it thinking I would see an improvement. The Performance section of Sam's design document starts:
As mentioned above, the no-GIL proof-of-concept interpreter is about 10% faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite.
so it didn't occur to me that I'd be looking at a slowdown, much less by as much as I'm seeing.
Maybe I've somehow stumbled on some instruction mix for which the nogil VM is much worse than the stock VM. For now, I prefer to think I'm just doing something stupid. It certainly wouldn't be the first time.
P.S. I suppose I should have cc'd Sam when I first replied to this thread, but I'm doing so now. I figured my mistake would reveal itself early on. Sam, here's my first post about my little "project." https://email@example.com/message/WBLU6PZ2...