I refactored the map call to break dict_keys into cpu_count() chunks, (so each f() call gets to run continuously over n/cpu_count() items) virtually the same results. pool map is much slower (4x) than regular map, and I don't know why.