When I quoted 10% as a rough measurement I was referring more to set operations like union, etc, which choosing the larger set may have a very real performance advantage.
For membership tests which are false, it may be quicker, as less memory is accessed. The buckets will be smaller (1,2,4, or 8 bytes vs 8 or 16 (32bit vs 64bit). But, to compare the entry's hash, an indirection into the entries array is required... The hueristics become less clear when considering how many collisions there are. It may also change for different tuning parameters (i.e. a lower load factor would alleviate the pointer dereferences require to compare items, but at the cost of more memory).
It may very well be the case that the target load factor only decreases a small amount (so that the ordered implementation still uses less memory), but the performance benefits of quick rejections make overall performance better. I don't know how 'objective' we can be here without discussing specific data sets people may be using.