[Python-Dev] Argument Clinic: what to do with builtins with non-standard signatures?

Fri Jan 24 16:07:47 CET 2014

BACKGROUND  (skippable if you're a know-it-all)

Argument parsing for Python functions follows some very strict rules.
Unless the function implements its own parsing like so:

     def black_box(*args, **kwargs):

there are some semantics that are always true.  For example:

     * Any parameter that has a default value is optional, and vice-versa.

     * It doesn't matter whether you pass in a parameter by name or by
       position, it behaves the same.

     * You can see the default values by examining its inspect.Signature.

     * Calling a function and passing in the default value for a parameter
       is identical to calling the function without that parameter.

       e.g. (assuming foo is a pure function):

          def foo(a=value): ...

          foo() == foo(value) == foo(a=value)

       With that signature, foo() literally can't tell the difference
       between those three calls.  And it doesn't matter what the type
       of value is or where you got it.

Python builtins are a little less regular.  They effectively do their own
parsing.  So they *could* do any crazy thing they want.  99.9% of the time
they do one of four standard things:
   * They parse their arguments with a single call to PyArg_ParseTuple().
   * They parse their arguments with a single call to
     PyArg_ParseTupleAndKeywords().
   * They take a single argument of type "object" (METH_O).
   * They take no arguments (METH_NOARGS).

PyArg_ParseTupleAndKeywords() behaves almost exactly like a Python
function.  PyArg_ParseTuple() is a little less like a Python function,
because it doesn't support keyword arguments.  (Surely this behavior is
familiar to you!)

But then there's that funny 0.1%, the builtins that came up with their
own unique approach for parsing arguments--given them funny semantics.
Argument Clinic tries to accomodate these as best it can.   (That's why
it supports "optional groups" for example.)  But it can only do so much.

THE PROBLEM

Argument Clinic's original goal was to provide an introspection signature
for every builtin in Python.

But a small percentage of builtins have funny semantics that aren't
expressable in a valid Python signature.  This makes them hard to convert
to Argument Clinic, and makes their signature inaccurate.

If we want these functions to have an accurate Python introspection
signature, their argument parsing will have to change.

THE QUESTION

What should someone converting functions to Argument Clinic do
when faced with one of these functions?

Of course, the simplest answer is "nothing"--don't convert the
function to Argument Clinic.   We're in beta, and any change
that isn't a bugfix is impermissible.  We can try again for 3.5.

But if "any change" is impermissible, then we wouldn't have the
community support to convert to Argument Clinic right now.  The
community wants proper signatures for builtins badly enough that
we're doing it now, even though we're already in beta for Python
3.4.  Converting to Argument Clinic is, in the vast majority of
cases, a straightforward and low-risk change--but it is *a*
change.

Therefore perhaps the answer isn't an automatic "no".  Perhaps
additional straightforward, low-risk changes are permissible.  The
trick is, what constitutes a straightforward, low-risk change?
Where should we draw the line?  Let's discuss it.  Perhaps a
consensus will form around an answer besides a flat "no".

THE SPECIFICS

I'm sorting the problems we see into four rough categories.

a) Functions where there's a static Python value that behaves
    identically to not passing in that parameter (aka "the NULL problem")

    Example:
      _sha1.sha1().  Its optional parameter has a default value in C of 
NULL.
      We can't express NULL in a Python signature.  However, it just so 
happens
      that _sha1.sha1(b'') is exactly equivalent to _sha1.sha1(). b'' makes
      for a fine replacement default value.

      Same holds for list.__init__().  its optional "sequence" parameter has
      a default value in C of NULL.  But this signature:
         list.__init__(sequence=())
      works fine.

      The way Clinic works, we can actually still use the NULL as the 
default
      value in C.  Clinic will let you use completely different values as
      the published default value in Python and the real default value in C.
      (Consenting adults rule and all that.)  So we could lie to Python and
      everything works just the way we want it to.

    Possible Solutions:
      0) Do nothing, don't convert the function.
      1) Use that clever static value as the default.

b) Functions where there's no static Python value that behaves 
identically to
    not passing in that parameter (aka "the dynamic default problem")

    There are functions with parameters whose defaults are mildly dynamic,
    responding to other parameters.

    Example:
      I forget its name, but someone recently showed me a builtin that took
      a list as its first parameter, and its optional second parameter
      defaulted to the length of the list.  As I recall this function didn't
      allow negative numbers, so -1 wasn't a good fit.

    Possible solutions:
      0) Do nothing, don't convert the function.
      1) Use a magic value as None.  Preferably of the same type as the
         function accepts, but failing that use None.  If they pass in
         the magic value use the previous default value.  Guido himself
         suggested this in
      2) Use an Argument Clinic "optional group".  This only works for
         functions that don't support keyword arguments.  Also, I hate
         this, because "optional groups" are not expressable in Python
         syntax, so these functions automatically have invalid signatures.

c) Functions that accept an 'int' when they mean 'boolean' (aka the
    "ints instead of bools" problem)

    This is specific but surprisingly common.

    Before Python 3.3 there was no PyArg_ParseTuple format unit that meant
    "boolean value".  Functions generally used "i" (int).  Even older
    functions accepted an object and called PyLong_AsLong() on it.
    Passing in True or False for "i" (or PyLong_AsLong()) works, because
    boolean inherits from long.   But anything other than ints and bools
    throws an exception.

    In Python 3.3 I added the "p" format unit for boolean arguments.
    This calls PyObject_IsTrue() which accepts nearly any Python value.

    I assert that Python has a crystal clear definition of what
    constitutes "true" and "false".  These parameters are clearly
    intended as booleans but they don't conform to the boolean
    protocol.  So I suggest every instance of this is a (very mild!)
    bug.  But changing these parameters to use "p" is a change: they'll
    accept many more values than before.

    Right now people convert these using 'int' because that's an exact
    match.  But sometimes they are optional, and the person doing the
    conversion wants to use True or False as a default value, and it
    doesn't work: Argument Clinic's type enforcement complains and
    they have to work around it.  (Argument Clinic has to enforce some
    type-safety here because the values are used as defaults for C
    variables.)  I've been asked to allow True and False as defaults
    for "int" parameters specifically because of this.

    Example:
      str.splitlines(keepends)

    Solution:
      1) Use "bool".
      2) Use "int", and I'll go relax Argument Clinic so they
         can use bool values as defaults for int parameters.

d) Functions with behavior that deliberately defy being expressed as a
    Python signature (aka the "untranslatable signature" problem)

    Example:
      itertools.repeat(), which behaves differently depending on whether
      "times" is supplied as a positional or keyword argument.  (If
      "times" is <0, and was supplied via position, the function yields
      0 times. If "times" is <0, and was supplied via keyword, the
      function yields infinitely-many times.)

    Solution:
      0) Do nothing, don't convert the function.
      1) Change the signature until it is Python compatible.  This new
         signature *must* accept a superset of the arguments accepted
         by the existing signature.  (This is being discussed right
         now in issue #19145.)

//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140124/af369fc2/attachment.html>