Documenting builtin methods

Steven D'Aprano steve at pearwood.info
Thu Jul 11 08:06:36 CEST 2013


On Thu, 11 Jul 2013 04:15:37 +0100, Joshua Landau wrote:

> I have this innocent and simple code:
> 
> from collections import deque
> exhaust_iter = deque(maxlen=0).extend 

At this point, exhaust_iter is another name for the bound instance method 
"extend" of one specific deque instance.

Other implementations may do otherwise[1], but CPython optimizes built-in 
methods and functions. E.g. they have no __dict__ so you can't add 
attributes to them. When you look up exhaust_iter.__doc__, you are 
actually looking up (type(exhaust_iter)).__doc__, which is a descriptor:

py> type(exhaust_iter).__doc__
<attribute '__doc__' of 'builtin_function_or_method' objects>
py> type(type(exhaust_iter).__doc__)
<class 'getset_descriptor'>


Confused yet? Don't worry, you will be...

So, calling exhaust_iter.__doc__:

1) looks up '__doc__' on the class "builtin_function_or_method", not the 
instance;

2) which looks up '__doc__' on the class __dict__:

py> type(exhaust_iter).__dict__['__doc__']
<attribute '__doc__' of 'builtin_function_or_method' objects>

3) This is a descriptor with __get__ and __set__ methods. Because the 
actual method is written in C, you can't access it's internals except via 
the API: even the class __dict__ is not really a dict, it's a wrapper 
around a dict:

py> type(type(exhaust_iter).__dict__)
<class 'mappingproxy'>


Anyway, we have a descriptor that returns the doc string:

py> descriptor = type(exhaust_iter).__doc__
py> descriptor.__get__(exhaust_iter)
'Extend the right side of the deque with elements from the iterable'

My guess is that it is fetching this from some private C member, which 
you can't get to from Python except via the descriptor. And you can't set 
it:

py> descriptor.__set__(exhaust_iter, '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: attribute '__doc__' of 'builtin_function_or_method' 
objects is not writable


which is probably because if you could write to it, it would change the 
docstring for *every* deque. And that would be bad.

If this were a pure-Python method, you could probably bypass the 
descriptor, but it's a C-level built-in. I think you're out of luck.

I think the right solution here is the trivial:

def exhaust(it):
    """Doc string here."""
    deque(maxlen=0).extend(it)


which will be fast enough for all but the tightest inner loops. But if 
you really care about optimizing this:


def factory():
    eatit = deque(maxlen=0).extend
    def exhaust_iter(it):
        """Doc string goes here"""
        eatit(it)
    return exhaust_iter

exhaust_it = factory()
del factory


which will be about as efficient as you can get while still having a 
custom docstring.

But really, I'm having trouble understanding what sort of application 
would have "run an iterator to exhaustion without doing anything with the 
values" as the performance bottleneck :-)



> exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"
> 
> Obviously it does not work. 

Even if it did work, it would not do what you hope. Because __doc__ is a 
dunder attribute (double leading and trailing underscores), help() 
currently looks it up on the class, not the instance:


class Spam:
    "Spam spam spam"

x = Spam()
help(x)
=> displays "Spam spam spam"

x.__doc__ = "Yummy spam"
help(x)
=> still displays "Spam spam spam"



> Is there a way to get it to work simply and
> without creating a new scope (which would be a rather inefficient a way
> to set documentation, and would hamper introspection)?
> 
> How about dropping the "simply" requirement?

I don't believe so.





[1] IronPython and Jython both currently do the same thing as CPython, so 
even if this is not explicitly language-defined behaviour, it looks like 
it may be de facto standard behaviour.


-- 
Steven



More information about the Python-list mailing list