Optimize _PyFunction_FastCallDict() for **kwargs

New submission from STINNER Victor:

def func(x, y):
    print(x, y)

def proxy2(func, **kw):

def proxy1(func, **kw):
    proxy2(func, **kw)

The "proxy2(func, **kw)" call in proxy1() is currently inefficient: _PyFunction_FastCallDict() converts the dictionary into a C array [key1, value1, key2, value2, ...] and then _PyEval_EvalCodeWithName() rebuilds the dictionary from the C array.

Since "func(*args, **kw)" proxies are common in Python, especially to call the parent constructor when overriding __init__, I think that it would be interesting to optimize this code path.

I first expected that it was a regression of FASTCALL, but Python < 3.6 doesn't optimize this code neither.

