Advanced ways to get object information from within python

Thu Dec 23 06:01:45 EST 2021

Hello,

I would like to significantly increase my abilities to find the information
I am seeking about any Python object I am using from within Python. I find
this to be a really essential skill set. After reading documentation, it
really helps to get under the hood at the command line and start testing
your own competence by examining all the methods and classes, and their
arguments and return types and so on.

I was hoping someone could help me fill in more details about what I
currently know.

I'd like to use Scrapy as an example, since it's a library I'm currently
learning.

import scrapy

I assume I'll start with "dir", as it's the most convenient.

dir(scrapy) shows this:

['Field', 'FormRequest', 'Item', 'Request', 'Selector', 'Spider',
'__all__', '__builtins__', '__cached__', '__doc__', '__file__',
'__loader__', '__name__', '__package__', '__path__', '__spec__',
'__version__', '_txv', 'exceptions', 'http', 'item', 'link',
'linkextractors', 'selector', 'signals', 'spiders', 'twisted_version',
'utils', 'version_info']

I wish there was a convenient way for me to know what all of these are. I
understand "dir" shows everything from the namespace - so that includes
methods which are present only because they are imported from other
modules, by this module.

Let's assume at minimum I know that I should be able to call all these
"attributes" (I believe this is what Python calls them - an attribute can
be anything, a method, a variable, etc. But then, how to distinguish
between this general notion of an "attribute" vs a specific attribute of a
class? Or is that called a "property" or something?)

I can confirm that every single name in the above list works when I call it
from scrapy, like this:

>>> scrapy.Field
<class 'scrapy.item.Field'>

>>> scrapy.utils
<module 'scrapy.utils' from
'/usr/local/lib/python3.9/dist-packages/scrapy/utils/__init__.py'>

But I can't conveniently iterate over all of these to see all their types,
because dir() returns a list of strings. How can I iterate over all
attributes?

I can't use "getattr" because that requires you to enter the name of what
you're looking for. I would like to spit out all attributes with their
types, so I can know what my options are in more detail than dir() provides.

This is basically a dead-end for me until someone can illuminate this
strategy I'm pursuing, so now I'd like to focus on inspect and help.

inspect.getmembers is useful in principle, but I find the results to give
information overload.

This is just an excerpt of what it returns:

pprint.pprint(inspect.getmembers(scrapy))
[('Field', <class 'scrapy.item.Field'>),
 ('Selector', <class 'scrapy.selector.unified.Selector'>),
 ('Spider', <class 'scrapy.spiders.Spider'>),
 ('__all__',
  ['__version__',
   'version_info',
   'twisted_version',
   'Spider',

Why does it just list the name and type for some classes, but for others
goes on to a sublist? __all__ does not list any type in adjacent angular
brackets, it just goes on to list some attributes without any information
about what they are. Can I suppress sublists from being printed with
inspect.getmethods? Or can I recursively require sublists also display
their type?

Lastly, the "help" function.

I find "help" to similarly be a situation of information overload. Again,
it starts with a list of "package contents". I'm not sure I see the use of
this long list of names, without much description of what they are. Next,
it lists "classes", but I don't understand:

    builtins.dict(builtins.object)
        scrapy.item.Field
    parsel.selector.Selector(builtins.object)
        scrapy.selector.unified.Selector(parsel.selector.Selector,
scrapy.utils.trackref.object_ref)

What determines the order of these classes - the order in which they appear
in the source code? What about the indentation? builtins.dict() is a Python
builtin. Then why is it listed inside of Scrapy's "help" - are all builtins
necessarily listed inside a class or just the builtins it specifically
imported or inherited?

My best guess is the most indented lines are what is actually written in
the class, the lines above are just listing the inheritance? So
scrapy.item.Field inherits the Python dictionary class, and it does this
because that way you can treat the class like a dictionary sometimes, using
dictionary methods and so on?

    class Field(builtins.dict)
     |  Container of field metadata
     |
     |  Method resolution order:
     |      Field
     |      builtins.dict
     |      builtins.object
     |
     |  Data descriptors defined here:

What are data descriptors?

I understand IDE's tend to just print the docstring of a method as a sort
of overlay while you are writing with it, but I'm not able to use the
__docstring__ variable well - scrapy.__docstring__,
scrapy.Spider.__docstring__, and so on, return "object has no attribute
__docstring__".

I'm really fond of inspect.getsource(), all else failing, though - that's
very clear and insightful.

There's more to learn but that's enough questions for now. I'd really
appreciate anybody helping me find effective ways of investigating modules,
classes and methods from the command line.

Thanks very much,
Julius