I wrote some (better than the previously shared) benchmarks for this change a while ago.
I think that you could speed up the algorithm if you check if dict->ma_keys->dk_lookup == lookdict_unicode_nodummy. If so, the dict is a combined dict with only string keys (quite common), and no deletion was done before, so there's no hole in ma_keys.