[Python-checkins] bpo-39452: Rewrite and expand __main__.rst (#26883)

ambv webhook-mailer at python.org
Tue Aug 24 13:01:50 EDT 2021


https://github.com/python/cpython/commit/7cba23164cf82f6619db002cd30021b5dfb1f809
commit: 7cba23164cf82f6619db002cd30021b5dfb1f809
branch: main
author: Jack DeVries <58614260+jdevries3133 at users.noreply.github.com>
committer: ambv <lukasz at langa.pl>
date: 2021-08-24T19:01:41+02:00
summary:

bpo-39452: Rewrite and expand __main__.rst (#26883)

Broadened scope of the document to explicitly discuss and differentiate between ``__main__.py`` in packages versus the ``__name__ == '__main__'`` expression (and the idioms that surround it), as well as ``import __main__``.

Co-authored-by: Géry Ogam <gery.ogam at gmail.com>
Co-authored-by: Éric Araujo <merwok at netwok.org>
Co-authored-by: Łukasz Langa <lukasz at langa.pl>

files:
A Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst
M Doc/library/__main__.rst
M Doc/reference/import.rst
M Doc/tools/susp-ignored.csv
M Doc/tutorial/modules.rst

diff --git a/Doc/library/__main__.rst b/Doc/library/__main__.rst
index a64faf1bbe3c84..116a9a9d1d729f 100644
--- a/Doc/library/__main__.rst
+++ b/Doc/library/__main__.rst
@@ -1,25 +1,368 @@
-
-:mod:`__main__` --- Top-level script environment
-================================================
+:mod:`__main__` --- Top-level code environment
+==============================================
 
 .. module:: __main__
-   :synopsis: The environment where the top-level script is run.
+   :synopsis: The environment where top-level code is run. Covers command-line
+              interfaces, import-time behavior, and ``__name__ == '__main__'``.
 
 --------------
 
-``'__main__'`` is the name of the scope in which top-level code executes.
-A module's __name__ is set equal to ``'__main__'`` when read from
-standard input, a script, or from an interactive prompt.
+In Python, the special name ``__main__`` is used for two important constructs:
+
+1. the name of the top-level environment of the program, which can be
+   checked using the ``__name__ == '__main__'`` expression; and
+2. the ``__main__.py`` file in Python packages.
+
+Both of these mechanisms are related to Python modules; how users interact with
+them and how they interact with each other.  They are explained in detail
+below.  If you're new to Python modules, see the tutorial section
+:ref:`tut-modules` for an introduction.
+
+
+.. _name_equals_main:
+
+``__name__ == '__main__'``
+---------------------------
+
+When a Python module or package is imported, ``__name__`` is set to the
+module's name.  Usually, this is the name of the Python file itself without the
+``.py`` extension::
+
+    >>> import configparser
+    >>> configparser.__name__
+    'configparser'
+
+If the file is part of a package, ``__name__`` will also include the parent
+package's path::
+
+    >>> from concurrent.futures import process
+    >>> process.__name__
+    'concurrent.futures.process'
+
+However, if the module is executed in the top-level code environment,
+its ``__name__`` is set to the string ``'__main__'``.
+
+What is the "top-level code environment"?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``__main__`` is the name of the environment where top-level code is run.
+"Top-level code" is the first user-specified Python module that starts running.
+It's "top-level" because it imports all other modules that the program needs.
+Sometimes "top-level code" is called an *entry point* to the application.
+
+The top-level code environment can be:
+
+* the scope of an interactive prompt::
+
+    >>> __name__
+    '__main__'
+
+* the Python module passed to the Python interpreter as a file argument:
+
+    .. code-block:: shell-session
+
+       $ python3 helloworld.py
+       Hello, world!
+
+* the Python module or package passed to the Python interpreter with the
+  :option:`-m` argument:
+
+    .. code-block:: shell-session
+
+       $ python3 -m tarfile
+       usage: tarfile.py [-h] [-v] (...)
+
+* Python code read by the Python interpreter from standard input:
+
+    .. code-block:: shell-session
+
+       $ echo "import this" | python3
+       The Zen of Python, by Tim Peters
+
+       Beautiful is better than ugly.
+       Explicit is better than implicit.
+       ...
+
+* Python code passed to the Python interpreter with the :option:`-c` argument:
+
+    .. code-block:: shell-session
+
+       $ python3 -c "import this"
+       The Zen of Python, by Tim Peters
+
+       Beautiful is better than ugly.
+       Explicit is better than implicit.
+       ...
+
+In each of these situations, the top-level module's ``__name__`` is set to
+``'__main__'``.
+
+As a result, a module can discover whether or not it is running in the
+top-level environment by checking its own ``__name__``, which allows a common
+idiom for conditionally executing code when the module is not initialized from
+an import statement::
+
+    if __name__ == '__main__':
+        # Execute when the module is not initialized from an import statement.
+        ...
+
+.. seealso::
+
+   For a more detailed look at how ``__name__`` is set in all situations, see
+   the tutorial section :ref:`tut-modules`.
+
+
+Idiomatic Usage
+^^^^^^^^^^^^^^^
+
+Some modules contain code that is intended for script use only, like parsing
+command-line arguments or fetching data from standard input.  When a module
+like this were to be imported from a different module, for example to unit test
+it, the script code would unintentionally execute as well.
+
+This is where using the ``if __name__ == '__main__'`` code block comes in
+handy. Code within this block won't run unless the module is executed in the
+top-level environment.
+
+Putting as few statements as possible in the block below ``if __name___ ==
+'__main__'`` can improve code clarity and correctness. Most often, a function
+named ``main`` encapsulates the program's primary behavior::
+
+    # echo.py
+
+    import shlex
+    import sys
+
+    def echo(phrase: str) -> None:
+       """A dummy wrapper around print."""
+       # for demonstration purposes, you can imagine that there is some
+       # valuable and reusable logic inside this function
+       print(phrase)
+
+    def main() -> int:
+        """Echo the input arguments to standard output"""
+        phrase = shlex.join(sys.argv)
+        echo(phrase)
+        return 0
+
+    if __name__ == '__main__':
+        sys.exit(main())  # next section explains the use of sys.exit
+
+Note that if the module didn't encapsulate code inside the ``main`` function
+but instead put it directly within the ``if __name__ == '__main__'`` block,
+the ``phrase`` variable would be global to the entire module.  This is
+error-prone as other functions within the module could be unintentionally using
+the global variable instead of a local name.  A ``main`` function solves this
+problem.
+
+Using a ``main`` function has the added benefit of the ``echo`` function itself
+being isolated and importable elsewhere. When ``echo.py`` is imported, the
+``echo`` and ``main`` functions will be defined, but neither of them will be
+called, because ``__name__ != '__main__'``.
+
+
+Packaging Considerations
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``main`` functions are often used to create command-line tools by specifying
+them as entry points for console scripts.  When this is done,
+`pip <https://pip.pypa.io/>`_ inserts the function call into a template script,
+where the return value of ``main`` is passed into :func:`sys.exit`.
+For example::
+
+    sys.exit(main())
+
+Since the call to ``main`` is wrapped in :func:`sys.exit`, the expectation is
+that your function will return some value acceptable as an input to
+:func:`sys.exit`; typically, an integer or ``None`` (which is implicitly
+returned if your function does not have a return statement).
+
+By proactively following this convention ourselves, our module will have the
+same behavior when run directly (i.e. ``python3 echo.py``) as it will have if
+we later package it as a console script entry-point in a pip-installable
+package.
+
+In particular, be careful about returning strings from your ``main`` function.
+:func:`sys.exit` will interpret a string argument as a failure message, so
+your program will have an exit code of ``1``, indicating failure, and the
+string will be written to :data:`sys.stderr`.  The ``echo.py`` example from
+earlier exemplifies using the ``sys.exit(main())`` convention.
+
+.. seealso::
+
+   `Python Packaging User Guide <https://packaging.python.org/>`_
+   contains a collection of tutorials and references on how to distribute and
+   install Python packages with modern tools.
+
+
+``__main__.py`` in Python Packages
+----------------------------------
+
+If you are not familiar with Python packages, see section :ref:`tut-packages`
+of the tutorial.  Most commonly, the ``__main__.py`` file is used to provide
+a command-line interface for a package. Consider the following hypothetical
+package, "bandclass":
+
+.. code-block:: text
+
+   bandclass
+     ├── __init__.py
+     ├── __main__.py
+     └── student.py
+
+``__main__.py`` will be executed when the package itself is invoked
+directly from the command line using the :option:`-m` flag. For example:
+
+.. code-block:: shell-session
+
+   $ python3 -m bandclass
+
+This command will cause ``__main__.py`` to run. How you utilize this mechanism
+will depend on the nature of the package you are writing, but in this
+hypothetical case, it might make sense to allow the teacher to search for
+students::
+
+    # bandclass/__main__.py
+
+    import sys
+    from .student import search_students
+
+    student_name = sys.argv[2] if len(sys.argv) >= 2 else ''
+    print(f'Found student: {search_students(student_name)}')
+
+Note that ``from .student import search_students`` is an example of a relative
+import.  This import style must be used when referencing modules within a
+package.  For more details, see :ref:`intra-package-references` in the
+:ref:`tut-modules` section of the tutorial.
+
+Idiomatic Usage
+^^^^^^^^^^^^^^^
+
+The contents of ``__main__.py`` typically isn't fenced with
+``if __name__ == '__main__'`` blocks.  Instead, those files are kept short,
+functions to execute from other modules.  Those other modules can then be
+easily unit-tested and are properly reusable.
+
+If used, an ``if __name__ == '__main__'`` block will still work as expected
+for a ``__main__.py`` file within a package, because its ``__name__``
+attribute will include the package's path if imported::
+
+    >>> import asyncio.__main__
+    >>> asyncio.__main__.__name__
+    'asyncio.__main__'
+
+This won't work for ``__main__.py`` files in the root directory of a .zip file
+though.  Hence, for consistency, minimal ``__main__.py`` like the :mod:`venv`
+one mentioned above are preferred.
+
+.. seealso::
+
+   See :mod:`venv` for an example of a package with a minimal ``__main__.py``
+   in the standard library. It doesn't contain a ``if __name__ == '__main__'``
+   block. You can invoke it with ``python3 -m venv [directory]``.
+
+   See :mod:`runpy` for more details on the :option:`-m` flag to the
+   interpreter executable.
+
+   See :mod:`zipapp` for how to run applications packaged as *.zip* files. In
+   this case Python looks for a ``__main__.py`` file in the root directory of
+   the archive.
+
+
+
+``import __main__``
+-------------------
+
+Regardless of which module a Python program was started with, other modules
+running within that same program can import the top-level environment's scope
+(:term:`namespace`) by importing the ``__main__`` module.  This doesn't import
+a ``__main__.py`` file but rather whichever module that received the special
+name ``'__main__'``.
+
+Here is an example module that consumes the ``__main__`` namespace::
+
+    # namely.py
+
+    import __main__
+
+    def did_user_define_their_name():
+        return 'my_name' in dir(__main__)
+
+    def print_user_name():
+        if not did_user_define_their_name():
+            raise ValueError('Define the variable `my_name`!')
+
+        if '__file__' in dir(__main__):
+            print(__main__.my_name, "found in file", __main__.__file__)
+        else:
+            print(__main__.my_name)
+
+Example usage of this module could be as follows::
+
+    # start.py
+
+    import sys
+
+    from namely import print_user_name
+
+    # my_name = "Dinsdale"
+
+    def main():
+        try:
+            print_user_name()
+        except ValueError as ve:
+            return str(ve)
+
+    if __name__ == "__main__":
+        sys.exit(main())
+
+Now, if we started our program, the result would look like this:
+
+.. code-block:: shell-session
+
+   $ python3 start.py
+   Define the variable `my_name`!
+
+The exit code of the program would be 1, indicating an error. Uncommenting the
+line with ``my_name = "Dinsdale"`` fixes the program and now it exits with
+status code 0, indicating success:
+
+.. code-block:: shell-session
+
+   $ python3 start.py
+   Dinsdale found in file /path/to/start.py
+
+Note that importing ``__main__`` doesn't cause any issues with unintentionally
+running top-level code meant for script use which is put in the
+``if __name__ == "__main__"`` block of the ``start`` module. Why does this work?
+
+Python inserts an empty ``__main__`` module in :attr:`sys.modules` at
+interpreter startup, and populates it by running top-level code. In our example
+this is the ``start`` module which runs line by line and imports ``namely``.
+In turn, ``namely`` imports ``__main__`` (which is really ``start``). That's an
+import cycle! Fortunately, since the partially populated ``__main__``
+module is present in :attr:`sys.modules`, Python passes that to ``namely``.
+See :ref:`Special considerations for __main__ <import-dunder-main>` in the
+import system's reference for details on how this works.
+
+The Python REPL is another example of a "top-level environment", so anything
+defined in the REPL becomes part of the ``__main__`` scope::
 
-A module can discover whether or not it is running in the main scope by
-checking its own ``__name__``, which allows a common idiom for conditionally
-executing code in a module when it is run as a script or with ``python
--m`` but not when it is imported::
+    >>> import namely
+    >>> namely.did_user_define_their_name()
+    False
+    >>> namely.print_user_name()
+    Traceback (most recent call last):
+    ...
+    ValueError: Define the variable `my_name`!
+    >>> my_name = 'Jabberwocky'
+    >>> namely.did_user_define_their_name()
+    True
+    >>> namely.print_user_name()
+    Jabberwocky
 
-   if __name__ == "__main__":
-       # execute only if run as a script
-       main()
+Note that in this case the ``__main__`` scope doesn't contain a ``__file__``
+attribute as it's interactive.
 
-For a package, the same effect can be achieved by including a
-``__main__.py`` module, the contents of which will be executed when the
-module is run with ``-m``.
+The ``__main__`` scope is used in the implementation of :mod:`pdb` and
+:mod:`rlcompleter`.
diff --git a/Doc/reference/import.rst b/Doc/reference/import.rst
index 81a124f745ab6a..39fcba015b6947 100644
--- a/Doc/reference/import.rst
+++ b/Doc/reference/import.rst
@@ -975,6 +975,8 @@ should expose ``XXX.YYY.ZZZ`` as a usable expression, but .moduleY is
 not a valid expression.
 
 
+.. _import-dunder-main:
+
 Special considerations for __main__
 ===================================
 
diff --git a/Doc/tools/susp-ignored.csv b/Doc/tools/susp-ignored.csv
index 211b49a5449f48..35bcbd0404976f 100644
--- a/Doc/tools/susp-ignored.csv
+++ b/Doc/tools/susp-ignored.csv
@@ -110,6 +110,7 @@ howto/pyporting,,::,Programming Language :: Python :: 3
 howto/regex,,::,
 howto/regex,,:foo,(?:foo)
 howto/urllib2,,:password,"""joe:password at example.com"""
+library/__main__,,`,
 library/ast,,:upper,lower:upper
 library/ast,,:step,lower:upper:step
 library/audioop,,:ipos,"# factor = audioop.findfactor(in_test[ipos*2:ipos*2+len(out_test)],"
diff --git a/Doc/tutorial/modules.rst b/Doc/tutorial/modules.rst
index af595e5ca04d7e..a495c50cbde880 100644
--- a/Doc/tutorial/modules.rst
+++ b/Doc/tutorial/modules.rst
@@ -533,6 +533,8 @@ importing module needs to use submodules with the same name from different
 packages.
 
 
+.. _intra-package-references:
+
 Intra-package References
 ------------------------
 
diff --git a/Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst b/Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst
new file mode 100644
index 00000000000000..5c8cbd8e652232
--- /dev/null
+++ b/Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst
@@ -0,0 +1,4 @@
+Rewrote ``Doc/library/__main__.rst``. Broadened scope of the document to
+explicitly discuss and differentiate between ``__main__.py`` in packages
+versus the ``__name__ == '__main__'`` expression (and the idioms that
+surround it).



More information about the Python-checkins mailing list