Mailman 3 August 2022 - Python-checkins

Docs: normalise sqlite3 placeholder how-to heading (GH-96413)
by miss-islington Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/a0d0a77c1f78fee581294283a66bf198d6… commit: a0d0a77c1f78fee581294283a66bf198d69f2699 branch: 3.10 author: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com> committer: miss-islington <31488909+miss-islington(a)users.noreply.github.com> date: 2022-08-30T13:56:02-07:00 summary: Docs: normalise sqlite3 placeholder how-to heading (GH-96413) (cherry picked from commit 7b01ce7953c0e24aa7aeaf207216fc9e7aefd18a) Co-authored-by: Erlend E. Aasland <erlend.aasland(a)protonmail.com> files: M Doc/library/sqlite3.rst diff --git a/Doc/library/sqlite3.rst b/Doc/library/sqlite3.rst index 18a03a26e29..27645b05364 100644 --- a/Doc/library/sqlite3.rst +++ b/Doc/library/sqlite3.rst @@ -1301,8 +1301,8 @@ How-to guides .. _sqlite3-placeholders: -Using placeholders to bind values in SQL queries -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +How to use placeholders to bind values in SQL queries +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SQL operations usually need to use values from Python variables. However, beware of using Python's string operations to assemble queries, as they

1 0

Docs: normalise sqlite3 placeholder how-to heading (#96413)
by erlend-aasland Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/7b01ce7953c0e24aa7aeaf207216fc9e7a… commit: 7b01ce7953c0e24aa7aeaf207216fc9e7aefd18a branch: main author: Erlend E. Aasland <erlend.aasland(a)protonmail.com> committer: erlend-aasland <erlend.aasland(a)protonmail.com> date: 2022-08-30T22:44:14+02:00 summary: Docs: normalise sqlite3 placeholder how-to heading (#96413) files: M Doc/library/sqlite3.rst diff --git a/Doc/library/sqlite3.rst b/Doc/library/sqlite3.rst index 31a0e7f0b53..58343b1a0f7 100644 --- a/Doc/library/sqlite3.rst +++ b/Doc/library/sqlite3.rst @@ -1653,8 +1653,8 @@ How-to guides .. _sqlite3-placeholders: -Using placeholders to bind values in SQL queries -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +How to use placeholders to bind values in SQL queries +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ SQL operations usually need to use values from Python variables. However, beware of using Python's string operations to assemble queries, as they

1 0

[3.11] [Enum] fix check in _test_simple_enum (GH-96435)
by ethanfurman Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/8f58db2279a97d826e5bfa99627ba9bb66… commit: 8f58db2279a97d826e5bfa99627ba9bb66f6f096 branch: 3.11 author: Ethan Furman <ethan(a)stoneleaf.us> committer: ethanfurman <ethan(a)stoneleaf.us> date: 2022-08-30T12:39:03-07:00 summary: [3.11] [Enum] fix check in _test_simple_enum (GH-96435) The builtin `property` is not a callable, so was failing the check in `_test_simple_enum` causing a match failure; this adds `property` to the bypass list. Co-authored-by: Alexandru Mărășteanu <alexei(a)users.noreply.github.com> files: M Lib/enum.py M Lib/test/test_enum.py diff --git a/Lib/enum.py b/Lib/enum.py index 63ca1605353..28b638c28f1 100644 --- a/Lib/enum.py +++ b/Lib/enum.py @@ -1904,7 +1904,7 @@ def _test_simple_enum(checked_enum, simple_enum): else: checked_value = checked_dict[key] simple_value = simple_dict[key] - if callable(checked_value): + if callable(checked_value) or isinstance(checked_value, bltns.property): continue if key == '__doc__': # remove all spaces/tabs diff --git a/Lib/test/test_enum.py b/Lib/test/test_enum.py index 4a42c738172..7964d3e474c 100644 --- a/Lib/test/test_enum.py +++ b/Lib/test/test_enum.py @@ -4337,10 +4337,16 @@ class SimpleColor: CYAN = 1 MAGENTA = 2 YELLOW = 3 + @bltns.property + def zeroth(self): + return 'zeroed %s' % self.name class CheckedColor(Enum): CYAN = 1 MAGENTA = 2 YELLOW = 3 + @bltns.property + def zeroth(self): + return 'zeroed %s' % self.name self.assertTrue(_test_simple_enum(CheckedColor, SimpleColor) is None) SimpleColor.MAGENTA._value_ = 9 self.assertRaisesRegex(

1 0

gh-96143: Add some comments and minor fixes missed in the original PR (#96433)
by pablogsal Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/f49dd54b72ec3fe6dbdcd2476a81092938… commit: f49dd54b72ec3fe6dbdcd2476a810929382bc196 branch: main author: Pablo Galindo Salgado <Pablogsal(a)gmail.com> committer: pablogsal <Pablogsal(a)gmail.com> date: 2022-08-30T19:37:22+01:00 summary: gh-96143: Add some comments and minor fixes missed in the original PR (#96433) * gh-96132: Add some comments and minor fixes missed in the original PR * Update Doc/using/cmdline.rst Co-authored-by: Kumar Aditya <59607654+kumaraditya303(a)users.noreply.github.com> Co-authored-by: Kumar Aditya <59607654+kumaraditya303(a)users.noreply.github.com> files: M Doc/howto/perf_profiling.rst M Doc/using/cmdline.rst M Lib/test/test_perf_profiler.py M Objects/perf_trampoline.c diff --git a/Doc/howto/perf_profiling.rst b/Doc/howto/perf_profiling.rst index 2e1bb48af8c8..ed8de888b3bc 100644 --- a/Doc/howto/perf_profiling.rst +++ b/Doc/howto/perf_profiling.rst @@ -155,6 +155,9 @@ active since the start of the Python interpreter, you can use the `-Xperf` optio $ python -Xperf my_script.py +You can also set the :envvar:`PYTHONPERFSUPPORT` to a nonzero value to actiavate perf +profiling mode globally. + There is also support for dynamically activating and deactivating the perf profiling mode by using the APIs in the :mod:`sys` module: diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst index 5ecc882d818f..fa2b07e468b3 100644 --- a/Doc/using/cmdline.rst +++ b/Doc/using/cmdline.rst @@ -582,6 +582,8 @@ Miscellaneous options .. versionadded:: 3.11 The ``-X frozen_modules`` option. + .. versionadded:: 3.12 + The ``-X perf`` option. Options you shouldn't use diff --git a/Lib/test/test_perf_profiler.py b/Lib/test/test_perf_profiler.py index c2aad85b652e..f587995b008f 100644 --- a/Lib/test/test_perf_profiler.py +++ b/Lib/test/test_perf_profiler.py @@ -58,7 +58,7 @@ def baz(): script = make_script(script_dir, "perftest", code) with subprocess.Popen( [sys.executable, "-Xperf", script], - universal_newlines=True, + text=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE, ) as process: diff --git a/Objects/perf_trampoline.c b/Objects/perf_trampoline.c index 02206b2786c8..2cbe3741f26f 100644 --- a/Objects/perf_trampoline.c +++ b/Objects/perf_trampoline.c @@ -284,12 +284,23 @@ new_code_arena(void) void *start = &_Py_trampoline_func_start; void *end = &_Py_trampoline_func_end; size_t code_size = end - start; + // TODO: Check the effect of alignment of the code chunks. Initial investigation + // showed that this has no effect on performance in x86-64 or aarch64 and the current + // version has the advantage that the unwinder in GDB can unwind across JIT-ed code. + // + // We should check the values in the future and see if there is a + // measurable performance improvement by rounding trampolines up to 32-bit + // or 64-bit alignment. size_t n_copies = mem_size / code_size; for (size_t i = 0; i < n_copies; i++) { memcpy(memory + i * code_size, start, code_size * sizeof(char)); } // Some systems may prevent us from creating executable code on the fly. + // TODO: Call icache invalidation intrinsics if available: + // __builtin___clear_cache/__clear_cache (depending if clang/gcc). This is + // technically not necessary but we could be missing something so better be + // safe. int res = mprotect(memory, mem_size, PROT_READ | PROT_EXEC); if (res == -1) { PyErr_SetFromErrno(PyExc_OSError);

1 0

Automatically update more GitHub projects. (#94921)
by ezio-melotti Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/45fd3685aad90de3be21c8f6eade7b5985… commit: 45fd3685aad90de3be21c8f6eade7b5985629fb8 branch: main author: Ezio Melotti <ezio.melotti(a)gmail.com> committer: ezio-melotti <ezio.melotti(a)gmail.com> date: 2022-08-30T20:12:55+02:00 summary: Automatically update more GitHub projects. (#94921) * Automatically update the `asyncio` GitHub project. * Use a matrix to add issues to projects. * Remove trailing whitespace. Co-authored-by: Hugo van Kemenade <hugovk(a)users.noreply.github.com> Co-authored-by: Hugo van Kemenade <hugovk(a)users.noreply.github.com> files: M .github/workflows/project-updater.yml diff --git a/.github/workflows/project-updater.yml b/.github/workflows/project-updater.yml index 716ed7841fea..ea98700e7fae 100644 --- a/.github/workflows/project-updater.yml +++ b/.github/workflows/project-updater.yml @@ -8,12 +8,20 @@ on: jobs: add-to-project: - name: Add to the Release and Deferred Blocker project + name: Add issues to projects runs-on: ubuntu-latest + strategy: + matrix: + include: + # if an issue has any of these labels, it will be added + # to the corresponding project + - { project: 2, label: "release-blocker, deferred-blocker" } + - { project: 3, label: expert-subinterpreters } + - { project: 29, label: expert-asyncio } + steps: - uses: actions/add-to-project(a)v0.1.0 with: - project-url: https://github.com/orgs/python/projects/2 + project-url: https://github.com/orgs/python/projects/${{ matrix.project }} github-token: ${{ secrets.ADD_TO_PROJECT_PAT }} - labeled: release-blocker, deferred-blocker - label-operator: OR + labeled: ${{ matrix.label }}

1 0

gh-95149: Enhance `http.HTTPStatus` with properties that indicate the HTTP status category (GH-95453)
by ethanfurman Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/0ed778835d34bc1f39d2c6cdbc0c1709f6… commit: 0ed778835d34bc1f39d2c6cdbc0c1709f6bcfd61 branch: main author: Alexandru Mărășteanu <alexei(a)users.noreply.github.com> committer: ethanfurman <ethan(a)stoneleaf.us> date: 2022-08-30T11:11:44-07:00 summary: gh-95149: Enhance `http.HTTPStatus` with properties that indicate the HTTP status category (GH-95453) files: A Misc/NEWS.d/next/Library/2022-08-07-14-56-23.gh-issue-95149.U0c6Ib.rst M Doc/library/http.rst M Lib/enum.py M Lib/http/__init__.py M Lib/test/test_httplib.py diff --git a/Doc/library/http.rst b/Doc/library/http.rst index 5895a41d849b..521fd1b7f50c 100644 --- a/Doc/library/http.rst +++ b/Doc/library/http.rst @@ -137,6 +137,31 @@ equal to the constant name (i.e. ``http.HTTPStatus.OK`` is also available as .. versionadded:: 3.9 Added ``103 EARLY_HINTS``, ``418 IM_A_TEAPOT`` and ``425 TOO_EARLY`` status codes. +HTTP status category +-------------------- + +.. versionadded:: 3.11 + +The enum values have several properties to indicate the HTTP status category: + +==================== ======================== =============================== +Property Indicates that Details +==================== ======================== =============================== +``is_informational`` ``100 <= status <= 199`` HTTP/1.1 :rfc:`7231`, Section 6 +``is_success`` ``200 <= status <= 299`` HTTP/1.1 :rfc:`7231`, Section 6 +``is_redirection`` ``300 <= status <= 399`` HTTP/1.1 :rfc:`7231`, Section 6 +``is_client_error`` ``400 <= status <= 499`` HTTP/1.1 :rfc:`7231`, Section 6 +``is_server_error`` ``500 <= status <= 599`` HTTP/1.1 :rfc:`7231`, Section 6 +==================== ======================== =============================== + + Usage:: + + >>> from http import HTTPStatus + >>> HTTPStatus.OK.is_success + True + >>> HTTPStatus.OK.is_client_error + False + .. class:: HTTPMethod .. versionadded:: 3.11 diff --git a/Lib/enum.py b/Lib/enum.py index 8ef69589a146..e7375e1eae69 100644 --- a/Lib/enum.py +++ b/Lib/enum.py @@ -1887,7 +1887,7 @@ def _test_simple_enum(checked_enum, simple_enum): else: checked_value = checked_dict[key] simple_value = simple_dict[key] - if callable(checked_value): + if callable(checked_value) or isinstance(checked_value, bltns.property): continue if key == '__doc__': # remove all spaces/tabs diff --git a/Lib/http/__init__.py b/Lib/http/__init__.py index cd2885dc7757..e093a1fec4df 100644 --- a/Lib/http/__init__.py +++ b/Lib/http/__init__.py @@ -31,6 +31,26 @@ def __new__(cls, value, phrase, description=''): obj.description = description return obj + @property + def is_informational(self): + return 100 <= self <= 199 + + @property + def is_success(self): + return 200 <= self <= 299 + + @property + def is_redirection(self): + return 300 <= self <= 399 + + @property + def is_client_error(self): + return 400 <= self <= 499 + + @property + def is_server_error(self): + return 500 <= self <= 599 + # informational CONTINUE = 100, 'Continue', 'Request received, please continue' SWITCHING_PROTOCOLS = (101, 'Switching Protocols', diff --git a/Lib/test/test_httplib.py b/Lib/test/test_httplib.py index 15dab0356f5e..b3d94e0a21cb 100644 --- a/Lib/test/test_httplib.py +++ b/Lib/test/test_httplib.py @@ -553,6 +553,27 @@ def __new__(cls, value, phrase, description=''): obj.phrase = phrase obj.description = description return obj + + @property + def is_informational(self): + return 100 <= self <= 199 + + @property + def is_success(self): + return 200 <= self <= 299 + + @property + def is_redirection(self): + return 300 <= self <= 399 + + @property + def is_client_error(self): + return 400 <= self <= 499 + + @property + def is_server_error(self): + return 500 <= self <= 599 + # informational CONTINUE = 100, 'Continue', 'Request received, please continue' SWITCHING_PROTOCOLS = (101, 'Switching Protocols', @@ -669,6 +690,30 @@ def __new__(cls, value, phrase, description=''): 'The client needs to authenticate to gain network access') enum._test_simple_enum(CheckedHTTPStatus, HTTPStatus) + def test_httpstatus_range(self): + """Checks that the statuses are in the 100-599 range""" + + for member in HTTPStatus.__members__.values(): + self.assertGreaterEqual(member, 100) + self.assertLessEqual(member, 599) + + def test_httpstatus_category(self): + """Checks that the statuses belong to the standard categories""" + + categories = ( + ((100, 199), "is_informational"), + ((200, 299), "is_success"), + ((300, 399), "is_redirection"), + ((400, 499), "is_client_error"), + ((500, 599), "is_server_error"), + ) + for member in HTTPStatus.__members__.values(): + for (lower, upper), category in categories: + category_indicator = getattr(member, category) + if lower <= member <= upper: + self.assertTrue(category_indicator) + else: + self.assertFalse(category_indicator) def test_status_lines(self): # Test HTTP status lines diff --git a/Misc/NEWS.d/next/Library/2022-08-07-14-56-23.gh-issue-95149.U0c6Ib.rst b/Misc/NEWS.d/next/Library/2022-08-07-14-56-23.gh-issue-95149.U0c6Ib.rst new file mode 100644 index 000000000000..6393444b53fb --- /dev/null +++ b/Misc/NEWS.d/next/Library/2022-08-07-14-56-23.gh-issue-95149.U0c6Ib.rst @@ -0,0 +1,2 @@ +The :class:`HTTPStatus <http.HTTPStatus>` enum offers a couple of properties +to indicate the HTTP status category e.g. ``HTTPStatus.OK.is_success``.

1 0

Fix regeneration of global objects through the Windows build files (GH-96394)
by zooba Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/13c309f1101dc86ca0138f239d45cb009d… commit: 13c309f1101dc86ca0138f239d45cb009d0e898d branch: main author: Kumar Aditya <59607654+kumaraditya303(a)users.noreply.github.com> committer: zooba <steve.dower(a)microsoft.com> date: 2022-08-30T18:41:27+01:00 summary: Fix regeneration of global objects through the Windows build files (GH-96394) files: M PCbuild/regen.targets diff --git a/PCbuild/regen.targets b/PCbuild/regen.targets index 9073bb6ab2b..3938b66678e 100644 --- a/PCbuild/regen.targets +++ b/PCbuild/regen.targets @@ -82,9 +82,16 @@ WorkingDirectory="$(PySourcePath)" /> </Target> + <Target Name="_RegenGlobalObjects" + DependsOnTargets="FindPythonForBuild"> + <Message Text="Regenerate Global Objects" Importance="high" /> + <Exec Command="$(PythonForBuild) Tools\scripts\generate_global_objects.py" + WorkingDirectory="$(PySourcePath)" /> + </Target> + <Target Name="Regen" Condition="$(Configuration) != 'PGUpdate'" - DependsOnTargets="_TouchRegenSources;_RegenPegen;_RegenAST_H;_RegenOpcodes;_RegenTokens;_RegenKeywords"> + DependsOnTargets="_TouchRegenSources;_RegenPegen;_RegenAST_H;_RegenOpcodes;_RegenTokens;_RegenKeywords;_RegenGlobalObjects"> <Message Text="Generated sources are up to date" Importance="high" /> </Target>

1 0

gh-95987: Fix `repr` of `Any` type subclasses (#96412)
by gvanrossum Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/4217393aeed42d67dd4b16a128528f5ca8… commit: 4217393aeed42d67dd4b16a128528f5ca8d939c4 branch: main author: Nikita Sobolev <mail(a)sobolevn.me> committer: gvanrossum <gvanrossum(a)gmail.com> date: 2022-08-30T10:36:16-07:00 summary: gh-95987: Fix `repr` of `Any` type subclasses (#96412) files: A Misc/NEWS.d/next/Library/2022-08-30-11-46-36.gh-issue-95987.CV7_u4.rst M Lib/test/test_typing.py M Lib/typing.py diff --git a/Lib/test/test_typing.py b/Lib/test/test_typing.py index 7eea01909ec..9239673c248 100644 --- a/Lib/test/test_typing.py +++ b/Lib/test/test_typing.py @@ -113,6 +113,12 @@ def test_any_instance_type_error(self): def test_repr(self): self.assertEqual(repr(Any), 'typing.Any') + class Sub(Any): pass + self.assertEqual( + repr(Sub), + "<class 'test.test_typing.AnyTests.test_repr.<locals>.Sub'>", + ) + def test_errors(self): with self.assertRaises(TypeError): issubclass(42, Any) diff --git a/Lib/typing.py b/Lib/typing.py index 596744ed132..84fe007a9ee 100644 --- a/Lib/typing.py +++ b/Lib/typing.py @@ -493,7 +493,9 @@ def __instancecheck__(self, obj): return super().__instancecheck__(obj) def __repr__(self): - return "typing.Any" + if self is Any: + return "typing.Any" + return super().__repr__() # respect to subclasses class Any(metaclass=_AnyMeta): diff --git a/Misc/NEWS.d/next/Library/2022-08-30-11-46-36.gh-issue-95987.CV7_u4.rst b/Misc/NEWS.d/next/Library/2022-08-30-11-46-36.gh-issue-95987.CV7_u4.rst new file mode 100644 index 00000000000..232bba1b924 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2022-08-30-11-46-36.gh-issue-95987.CV7_u4.rst @@ -0,0 +1 @@ +Fix ``repr`` of ``Any`` subclasses.

1 0

gh-96143: Allow Linux perf profiler to see Python calls (GH-96123)
by miss-islington Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/6d791a97364b68d5f9c3514a0470aac487… commit: 6d791a97364b68d5f9c3514a0470aac487fc538d branch: main author: Pablo Galindo Salgado <Pablogsal(a)gmail.com> committer: miss-islington <31488909+miss-islington(a)users.noreply.github.com> date: 2022-08-30T10:11:18-07:00 summary: gh-96143: Allow Linux perf profiler to see Python calls (GH-96123) :warning: :warning: Note for reviewers, hackers and fellow systems/low-level/compiler engineers :warning: :warning: If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive. files: A Doc/howto/perf_profiling.rst A Lib/test/test_perf_profiler.py A Misc/NEWS.d/next/Core and Builtins/2022-08-20-18-36-40.gh-issue-96143.nh3GFM.rst A Objects/asm_trampoline.S A Objects/perf_trampoline.c M Doc/c-api/init_config.rst M Doc/howto/index.rst M Doc/using/cmdline.rst M Include/cpython/initconfig.h M Include/internal/pycore_ceval.h M Lib/test/test_embed.py M Makefile.pre.in M Modules/posixmodule.c M PCbuild/_freeze_module.vcxproj M PCbuild/_freeze_module.vcxproj.filters M PCbuild/pythoncore.vcxproj M PCbuild/pythoncore.vcxproj.filters M Python/clinic/sysmodule.c.h M Python/initconfig.c M Python/pylifecycle.c M Python/sysmodule.c M configure M configure.ac M pyconfig.h.in diff --git a/Doc/c-api/init_config.rst b/Doc/c-api/init_config.rst index 2074ec4e0e8e..c4a342ee811c 100644 --- a/Doc/c-api/init_config.rst +++ b/Doc/c-api/init_config.rst @@ -1155,6 +1155,20 @@ PyConfig Default: ``-1`` in Python mode, ``0`` in isolated mode. + .. c:member:: int perf_profiling + + Enable compatibility mode with the perf profiler? + + If non-zero, initialize the perf trampoline. See :ref:`perf_profiling` + for more information. + + Set by :option:`-X perf <-X>` command line option and by the + :envvar:`PYTHONPERFSUPPORT` environment variable. + + Default: ``-1``. + + .. versionadded:: 3.12 + .. c:member:: int use_environment Use :ref:`environment variables <using-on-envvars>`? diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst index 8a378e6659ef..f521276a5a83 100644 --- a/Doc/howto/index.rst +++ b/Doc/howto/index.rst @@ -30,6 +30,7 @@ Currently, the HOWTOs are: ipaddress.rst clinic.rst instrumentation.rst + perf_profiling.rst annotations.rst isolating-extensions.rst diff --git a/Doc/howto/perf_profiling.rst b/Doc/howto/perf_profiling.rst new file mode 100644 index 000000000000..2e1bb48af8c8 --- /dev/null +++ b/Doc/howto/perf_profiling.rst @@ -0,0 +1,200 @@ +.. highlight:: shell-session + +.. _perf_profiling: + +============================================== +Python support for the Linux ``perf`` profiler +============================================== + +:author: Pablo Galindo + +The Linux ``perf`` profiler is a very powerful tool that allows you to profile and +obtain information about the performance of your application. ``perf`` also has +a very vibrant ecosystem of tools that aid with the analysis of the data that it +produces. + +The main problem with using the ``perf`` profiler with Python applications is that +``perf`` only allows to get information about native symbols, this is, the names of +the functions and procedures written in C. This means that the names and file names +of the Python functions in your code will not appear in the output of the ``perf``. + +Since Python 3.12, the interpreter can run in a special mode that allows Python +functions to appear in the output of the ``perf`` profiler. When this mode is +enabled, the interpreter will interpose a small piece of code compiled on the +fly before the execution of every Python function and it will teach ``perf`` the +relationship between this piece of code and the associated Python function using +`perf map files`_. + +.. warning:: + + Support for the ``perf`` profiler is only currently available for Linux on + selected architectures. Check the output of the configure build step or + check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE`` + to see if your system is supported. + +For example, consider the following script: + +.. code-block:: python + + def foo(n): + result = 0 + for _ in range(n): + result += 1 + return result + + def bar(n): + foo(n) + + def baz(n): + bar(n) + + if __name__ == "__main__": + baz(1000000) + +We can run perf to sample CPU stack traces at 9999 Hertz: + + $ perf record -F 9999 -g -o perf.data python my_script.py + +Then we can use perf report to analyze the data: + +.. code-block:: shell-session + + $ perf report --stdio -n -g + + # Children Self Samples Command Shared Object Symbol + # ........ ........ ............ .......... .................. .......................................... + # + 91.08% 0.00% 0 python.exe python.exe [.] _start + | + ---_start + | + --90.71%--__libc_start_main + Py_BytesMain + | + |--56.88%--pymain_run_python.constprop.0 + | | + | |--56.13%--_PyRun_AnyFileObject + | | _PyRun_SimpleFileObject + | | | + | | |--55.02%--run_mod + | | | | + | | | --54.65%--PyEval_EvalCode + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | | + | | | |--51.67%--_PyEval_EvalFrameDefault + | | | | | + | | | | |--11.52%--_PyLong_Add + | | | | | | + | | | | | |--2.97%--_PyObject_Malloc + ... + +As you can see here, the Python functions are not shown in the output, only ``_Py_Eval_EvalFrameDefault`` appears +(the function that evaluates the Python bytecode) shows up. Unfortunately that's not very useful because all Python +functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which +bytecode-evaluating function. + +Instead, if we run the same experiment with perf support activated we get: + +.. code-block:: shell-session + + $ perf report --stdio -n -g + + # Children Self Samples Command Shared Object Symbol + # ........ ........ ............ .......... .................. ..................................................................... + # + 90.58% 0.36% 1 python.exe python.exe [.] _start + | + ---_start + | + --89.86%--__libc_start_main + Py_BytesMain + | + |--55.43%--pymain_run_python.constprop.0 + | | + | |--54.71%--_PyRun_AnyFileObject + | | _PyRun_SimpleFileObject + | | | + | | |--53.62%--run_mod + | | | | + | | | --53.26%--PyEval_EvalCode + | | | py::<module>:/src/script.py + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | py::baz:/src/script.py + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | py::bar:/src/script.py + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | py::foo:/src/script.py + | | | | + | | | |--51.81%--_PyEval_EvalFrameDefault + | | | | | + | | | | |--13.77%--_PyLong_Add + | | | | | | + | | | | | |--3.26%--_PyObject_Malloc + + + +Enabling perf profiling mode +---------------------------- + +There are two main ways to activate the perf profiling mode. If you want it to be +active since the start of the Python interpreter, you can use the `-Xperf` option: + + $ python -Xperf my_script.py + +There is also support for dynamically activating and deactivating the perf +profiling mode by using the APIs in the :mod:`sys` module: + +.. code-block:: python + + import sys + sys.activate_stack_trampoline("perf") + + # Run some code with Perf profiling active + + sys.deactivate_stack_trampoline() + + # Perf profiling is not active anymore + +These APIs can be handy if you want to activate/deactivate profiling mode in +response to a signal or other communication mechanism with your process. + + + +Now we can analyze the data with ``perf report``: + + $ perf report -g -i perf.data + + +How to obtain the best results +------------------------------- + +For the best results, Python should be compiled with +``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"`` as this allows +profilers to unwind using only the frame pointer and not on DWARF debug +information. This is because as the code that is interposed to allow perf +support is dynamically generated it doesn't have any DWARF debugging information +available. + +You can check if you system has been compiled with this flag by running: + + $ python -m sysconfig | grep 'no-omit-frame-pointer' + +If you don't see any output it means that your interpreter has not been compiled with +frame pointers and therefore it may not be able to show Python functions in the output +of ``perf``. + +.. _perf map files: https://github.com/torvalds/linux/blob/0513e464f9007b70b96740271a948ca5ab6e… diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst index 6678d476fa83..5ecc882d818f 100644 --- a/Doc/using/cmdline.rst +++ b/Doc/using/cmdline.rst @@ -535,6 +535,12 @@ Miscellaneous options development (running from the source tree) then the default is "off". Note that the "importlib_bootstrap" and "importlib_bootstrap_external" frozen modules are always used, even if this flag is set to "off". + * ``-X perf`` to activate compatibility mode with the ``perf`` profiler. + When this option is activated, the Linux ``perf`` profiler will be able to + report Python calls. This option is only available on some platforms and + will do nothing if is not supported on the current system. The default value + is "off". See also :envvar:`PYTHONPERFSUPPORT` and :ref:`perf_profiling` + for more information. It also allows passing arbitrary values and retrieving them through the :data:`sys._xoptions` dictionary. @@ -1025,6 +1031,13 @@ conflict. .. versionadded:: 3.11 +.. envvar:: PYTHONPERFSUPPORT + + If this variable is set to a nonzero value, it activates compatibility mode + with the ``perf`` profiler so Python calls can be detected by it. See the + :ref:`perf_profiling` section for more information. + + .. versionadded:: 3.12 Debug-mode variables diff --git a/Include/cpython/initconfig.h b/Include/cpython/initconfig.h index 3b6d59389f26..c6057a4c3ed9 100644 --- a/Include/cpython/initconfig.h +++ b/Include/cpython/initconfig.h @@ -142,6 +142,7 @@ typedef struct PyConfig { unsigned long hash_seed; int faulthandler; int tracemalloc; + int perf_profiling; int import_time; int code_debug_ranges; int show_ref_count; diff --git a/Include/internal/pycore_ceval.h b/Include/internal/pycore_ceval.h index 2fcdaad358b0..4914948c6ca7 100644 --- a/Include/internal/pycore_ceval.h +++ b/Include/internal/pycore_ceval.h @@ -65,6 +65,27 @@ extern PyObject* _PyEval_BuiltinsFromGlobals( PyThreadState *tstate, PyObject *globals); +// Trampoline API + +typedef struct { + // Callback to initialize the trampoline state + void* (*init_state)(void); + // Callback to register every trampoline being created + void (*write_state)(void* state, const void *code_addr, + unsigned int code_size, PyCodeObject* code); + // Callback to free the trampoline state + int (*free_state)(void* state); +} _PyPerf_Callbacks; + +extern int _PyPerfTrampoline_SetCallbacks(_PyPerf_Callbacks *); +extern void _PyPerfTrampoline_GetCallbacks(_PyPerf_Callbacks *); +extern int _PyPerfTrampoline_Init(int activate); +extern int _PyPerfTrampoline_Fini(void); +extern int _PyIsPerfTrampolineActive(void); +extern PyStatus _PyPerfTrampoline_AfterFork_Child(void); +#ifdef PY_HAVE_PERF_TRAMPOLINE +extern _PyPerf_Callbacks _Py_perfmap_callbacks; +#endif static inline PyObject* _PyEval_EvalFrame(PyThreadState *tstate, struct _PyInterpreterFrame *frame, int throwflag) diff --git a/Lib/test/test_embed.py b/Lib/test/test_embed.py index c546bb08e297..70d7367ea9e6 100644 --- a/Lib/test/test_embed.py +++ b/Lib/test/test_embed.py @@ -436,6 +436,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase): 'hash_seed': 0, 'faulthandler': 0, 'tracemalloc': 0, + 'perf_profiling': 0, 'import_time': 0, 'code_debug_ranges': 1, 'show_ref_count': 0, @@ -520,6 +521,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase): use_hash_seed=0, faulthandler=0, tracemalloc=0, + perf_profiling=0, pathconfig_warnings=0, ) if MS_WINDOWS: @@ -828,6 +830,7 @@ def test_init_from_config(self): 'use_hash_seed': 1, 'hash_seed': 123, 'tracemalloc': 2, + 'perf_profiling': 0, 'import_time': 1, 'code_debug_ranges': 0, 'show_ref_count': 1, @@ -890,6 +893,7 @@ def test_init_compat_env(self): 'use_hash_seed': 1, 'hash_seed': 42, 'tracemalloc': 2, + 'perf_profiling': 0, 'import_time': 1, 'code_debug_ranges': 0, 'malloc_stats': 1, @@ -921,6 +925,7 @@ def test_init_python_env(self): 'use_hash_seed': 1, 'hash_seed': 42, 'tracemalloc': 2, + 'perf_profiling': 0, 'import_time': 1, 'code_debug_ranges': 0, 'malloc_stats': 1, diff --git a/Lib/test/test_perf_profiler.py b/Lib/test/test_perf_profiler.py new file mode 100644 index 000000000000..c2aad85b652e --- /dev/null +++ b/Lib/test/test_perf_profiler.py @@ -0,0 +1,348 @@ +import unittest +import subprocess +import sys +import sysconfig +import os +import pathlib +from test import support +from test.support.script_helper import ( + make_script, + assert_python_failure, + assert_python_ok, +) +from test.support.os_helper import temp_dir + + +if not support.has_subprocess_support: + raise unittest.SkipTest("test module requires subprocess") + + +def supports_trampoline_profiling(): + perf_trampoline = sysconfig.get_config_var("PY_HAVE_PERF_TRAMPOLINE") + if not perf_trampoline: + return False + return int(perf_trampoline) == 1 + + +if not supports_trampoline_profiling(): + raise unittest.SkipTest("perf trampoline profiling not supported") + + +class TestPerfTrampoline(unittest.TestCase): + def setUp(self): + super().setUp() + self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map")) + + def tearDown(self) -> None: + super().tearDown() + files_to_delete = ( + set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files + ) + for file in files_to_delete: + file.unlink() + + def test_trampoline_works(self): + code = """if 1: + def foo(): + pass + + def bar(): + foo() + + def baz(): + bar() + + baz() + """ + with temp_dir() as script_dir: + script = make_script(script_dir, "perftest", code) + with subprocess.Popen( + [sys.executable, "-Xperf", script], + universal_newlines=True, + stderr=subprocess.PIPE, + stdout=subprocess.PIPE, + ) as process: + stdout, stderr = process.communicate() + + self.assertEqual(stderr, "") + self.assertEqual(stdout, "") + + perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map") + self.assertTrue(perf_file.exists()) + perf_file_contents = perf_file.read_text() + self.assertIn(f"py::foo:{script}", perf_file_contents) + self.assertIn(f"py::bar:{script}", perf_file_contents) + self.assertIn(f"py::baz:{script}", perf_file_contents) + + def test_trampoline_works_with_forks(self): + code = """if 1: + import os, sys + + def foo_fork(): + pass + + def bar_fork(): + foo_fork() + + def baz_fork(): + bar_fork() + + def foo(): + pid = os.fork() + if pid == 0: + print(os.getpid()) + baz_fork() + else: + _, status = os.waitpid(-1, 0) + sys.exit(status) + + def bar(): + foo() + + def baz(): + bar() + + baz() + """ + with temp_dir() as script_dir: + script = make_script(script_dir, "perftest", code) + with subprocess.Popen( + [sys.executable, "-Xperf", script], + universal_newlines=True, + stderr=subprocess.PIPE, + stdout=subprocess.PIPE, + ) as process: + stdout, stderr = process.communicate() + + self.assertEqual(process.returncode, 0) + self.assertEqual(stderr, "") + child_pid = int(stdout.strip()) + perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map") + perf_child_file = pathlib.Path(f"/tmp/perf-{child_pid}.map") + self.assertTrue(perf_file.exists()) + self.assertTrue(perf_child_file.exists()) + + perf_file_contents = perf_file.read_text() + self.assertIn(f"py::foo:{script}", perf_file_contents) + self.assertIn(f"py::bar:{script}", perf_file_contents) + self.assertIn(f"py::baz:{script}", perf_file_contents) + + child_perf_file_contents = perf_child_file.read_text() + self.assertIn(f"py::foo_fork:{script}", child_perf_file_contents) + self.assertIn(f"py::bar_fork:{script}", child_perf_file_contents) + self.assertIn(f"py::baz_fork:{script}", child_perf_file_contents) + + def test_sys_api(self): + code = """if 1: + import sys + def foo(): + pass + + def spam(): + pass + + def bar(): + sys.deactivate_stack_trampoline() + foo() + sys.activate_stack_trampoline("perf") + spam() + + def baz(): + bar() + + sys.activate_stack_trampoline("perf") + baz() + """ + with temp_dir() as script_dir: + script = make_script(script_dir, "perftest", code) + with subprocess.Popen( + [sys.executable, script], + universal_newlines=True, + stderr=subprocess.PIPE, + stdout=subprocess.PIPE, + ) as process: + stdout, stderr = process.communicate() + + self.assertEqual(stderr, "") + self.assertEqual(stdout, "") + + perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map") + self.assertTrue(perf_file.exists()) + perf_file_contents = perf_file.read_text() + self.assertNotIn(f"py::foo:{script}", perf_file_contents) + self.assertIn(f"py::spam:{script}", perf_file_contents) + self.assertIn(f"py::bar:{script}", perf_file_contents) + self.assertIn(f"py::baz:{script}", perf_file_contents) + + def test_sys_api_with_existing_trampoline(self): + code = """if 1: + import sys + sys.activate_stack_trampoline("perf") + sys.activate_stack_trampoline("perf") + """ + assert_python_ok("-c", code) + + def test_sys_api_with_invalid_trampoline(self): + code = """if 1: + import sys + sys.activate_stack_trampoline("invalid") + """ + rc, out, err = assert_python_failure("-c", code) + self.assertIn("invalid backend: invalid", err.decode()) + + def test_sys_api_get_status(self): + code = """if 1: + import sys + sys.activate_stack_trampoline("perf") + assert sys.is_stack_trampoline_active() is True + sys.deactivate_stack_trampoline() + assert sys.is_stack_trampoline_active() is False + """ + assert_python_ok("-c", code) + + +def is_unwinding_reliable(): + cflags = sysconfig.get_config_var("PY_CORE_CFLAGS") + if not cflags: + return False + return "no-omit-frame-pointer" in cflags + + +def perf_command_works(): + try: + cmd = ["perf", "--help"] + stdout = subprocess.check_output(cmd, universal_newlines=True) + except (subprocess.SubprocessError, OSError): + return False + + # perf version does not return a version number on Fedora. Use presence + # of "perf.data" in help as indicator that it's perf from Linux tools. + if "perf.data" not in stdout: + return False + + # Check that we can run a simple perf run + with temp_dir() as script_dir: + try: + output_file = script_dir + "/perf_output.perf" + cmd = ( + "perf", + "record", + "-g", + "--call-graph=fp", + "-o", + output_file, + "--", + sys.executable, + "-c", + 'print("hello")', + ) + stdout = subprocess.check_output( + cmd, cwd=script_dir, universal_newlines=True, stderr=subprocess.STDOUT + ) + except (subprocess.SubprocessError, OSError): + return False + + if "hello" not in stdout: + return False + + return True + + +def run_perf(cwd, *args, **env_vars): + if env_vars: + env = os.environ.copy() + env.update(env_vars) + else: + env = None + output_file = cwd + "/perf_output.perf" + base_cmd = ("perf", "record", "-g", "--call-graph=fp", "-o", output_file, "--") + proc = subprocess.run( + base_cmd + args, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + env=env, + ) + if proc.returncode: + print(proc.stderr) + raise ValueError(f"Perf failed with return code {proc.returncode}") + + base_cmd = ("perf", "script") + proc = subprocess.run( + ("perf", "script", "-i", output_file), + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + env=env, + check=True, + ) + return proc.stdout.decode("utf-8", "replace"), proc.stderr.decode( + "utf-8", "replace" + ) + + +(a)unittest.skipUnless(perf_command_works(), "perf command doesn't work") +(a)unittest.skipUnless(is_unwinding_reliable(), "Unwinding is unreliable") +(a)support.skip_if_sanitizer(address=True, memory=True, ub=True) +class TestPerfProfiler(unittest.TestCase): + def setUp(self): + super().setUp() + self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map")) + + def tearDown(self) -> None: + super().tearDown() + files_to_delete = ( + set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files + ) + for file in files_to_delete: + file.unlink() + + def test_python_calls_appear_in_the_stack_if_perf_activated(self): + with temp_dir() as script_dir: + code = """if 1: + def foo(n): + x = 0 + for i in range(n): + x += i + + def bar(n): + foo(n) + + def baz(n): + bar(n) + + baz(10000000) + """ + script = make_script(script_dir, "perftest", code) + stdout, stderr = run_perf(script_dir, sys.executable, "-Xperf", script) + self.assertEqual(stderr, "") + + self.assertIn(f"py::foo:{script}", stdout) + self.assertIn(f"py::bar:{script}", stdout) + self.assertIn(f"py::baz:{script}", stdout) + + def test_python_calls_do_not_appear_in_the_stack_if_perf_activated(self): + with temp_dir() as script_dir: + code = """if 1: + def foo(n): + x = 0 + for i in range(n): + x += i + + def bar(n): + foo(n) + + def baz(n): + bar(n) + + baz(10000000) + """ + script = make_script(script_dir, "perftest", code) + stdout, stderr = run_perf(script_dir, sys.executable, script) + self.assertEqual(stderr, "") + + self.assertNotIn(f"py::foo:{script}", stdout) + self.assertNotIn(f"py::bar:{script}", stdout) + self.assertNotIn(f"py::baz:{script}", stdout) + + +if __name__ == "__main__": + unittest.main() diff --git a/Makefile.pre.in b/Makefile.pre.in index 94ddfa4b1bed..107a7075ebf6 100644 --- a/Makefile.pre.in +++ b/Makefile.pre.in @@ -478,7 +478,9 @@ OBJECT_OBJS= \ Objects/unicodeobject.o \ Objects/unicodectype.o \ Objects/unionobject.o \ - Objects/weakrefobject.o + Objects/weakrefobject.o \ + Objects/perf_trampoline.o \ + @PERF_TRAMPOLINE_OBJ@ DEEPFREEZE_OBJS = Python/deepfreeze/deepfreeze.o @@ -2358,6 +2360,9 @@ config.status: $(srcdir)/configure .PRECIOUS: config.status $(BUILDPYTHON) Makefile Makefile.pre +Objects/asm_trampoline.o: $(srcdir)/Objects/asm_trampoline.S + $(CC) -c $(PY_CORE_CFLAGS) -o $@ $< + # Some make's put the object file in the current directory .c.o: $(CC) -c $(PY_CORE_CFLAGS) -o $@ $< diff --git a/Misc/NEWS.d/next/Core and Builtins/2022-08-20-18-36-40.gh-issue-96143.nh3GFM.rst b/Misc/NEWS.d/next/Core and Builtins/2022-08-20-18-36-40.gh-issue-96143.nh3GFM.rst new file mode 100644 index 000000000000..30f44fd453a5 --- /dev/null +++ b/Misc/NEWS.d/next/Core and Builtins/2022-08-20-18-36-40.gh-issue-96143.nh3GFM.rst @@ -0,0 +1,7 @@ +Add a new ``-X perf`` Python command line option as well as +:func:`sys.activate_stack_trampoline` and :func:`sys.deactivate_stack_trampoline` +function in the :mod:`sys` module that allows to set/unset the interpreter in a +way that the Linux ``perf`` profiler can detect Python calls. The new +:func:`sys.is_stack_trampoline_active` function allows to query the state of the +perf trampoline. Design by Pablo Galindo. Patch by Pablo Galindo and Christian Heimes +with contributions from Gregory P. Smith [Google] and Mark Shannon. diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c index d45fa231ae5e..3810bc87c1fb 100644 --- a/Modules/posixmodule.c +++ b/Modules/posixmodule.c @@ -606,6 +606,11 @@ PyOS_AfterFork_Child(void) } assert(_PyThreadState_GET() == tstate); + status = _PyPerfTrampoline_AfterFork_Child(); + if (_PyStatus_EXCEPTION(status)) { + goto fatal_error; + } + run_at_forkers(tstate->interp->after_forkers_child, 0); return; diff --git a/Objects/asm_trampoline.S b/Objects/asm_trampoline.S new file mode 100644 index 000000000000..460707717df0 --- /dev/null +++ b/Objects/asm_trampoline.S @@ -0,0 +1,28 @@ + .text + .globl _Py_trampoline_func_start +# The following assembly is equivalent to: +# PyObject * +# trampoline(PyThreadState *ts, _PyInterpreterFrame *f, +# int throwflag, py_evaluator evaluator) +# { +# return evaluator(ts, f, throwflag); +# } +_Py_trampoline_func_start: +#ifdef __x86_64__ + sub $8, %rsp + call *%rcx + add $8, %rsp + ret +#endif // __x86_64__ +#if defined(__aarch64__) && defined(__AARCH64EL__) && !defined(__ILP32__) + // ARM64 little endian, 64bit ABI + // generate with aarch64-linux-gnu-gcc 12.1 + stp x29, x30, [sp, -16]! + mov x29, sp + blr x3 + ldp x29, x30, [sp], 16 + ret +#endif + .globl _Py_trampoline_func_end +_Py_trampoline_func_end: + .section .note.GNU-stack,"",@progbits diff --git a/Objects/perf_trampoline.c b/Objects/perf_trampoline.c new file mode 100644 index 000000000000..02206b2786c8 --- /dev/null +++ b/Objects/perf_trampoline.c @@ -0,0 +1,501 @@ +/* + +Perf trampoline instrumentation +=============================== + +This file contains instrumentation to allow to associate +calls to the CPython eval loop back to the names of the Python +functions and filename being executed. + +Many native performance profilers like the Linux perf tools are +only available to 'see' the C stack when sampling from the profiled +process. This means that if we have the following python code: + + import time + def foo(n): + # Some CPU intensive code + + def bar(n): + foo(n) + + def baz(n): + bar(n) + + baz(10000000) + +A performance profiler that is only able to see native frames will +produce the following backtrace when sampling from foo(): + + _PyEval_EvalFrameDefault -----> Evaluation frame of foo() + _PyEval_Vector + _PyFunction_Vectorcall + PyObject_Vectorcall + call_function + + _PyEval_EvalFrameDefault ------> Evaluation frame of bar() + _PyEval_EvalFrame + _PyEval_Vector + _PyFunction_Vectorcall + PyObject_Vectorcall + call_function + + _PyEval_EvalFrameDefault -------> Evaluation frame of baz() + _PyEval_EvalFrame + _PyEval_Vector + _PyFunction_Vectorcall + PyObject_Vectorcall + call_function + + ... + + Py_RunMain + +Because the profiler is only able to see the native frames and the native +function that runs the evaluation loop is the same (_PyEval_EvalFrameDefault) +then the profiler and any reporter generated by it will not be able to +associate the names of the Python functions and the filenames associated with +those calls, rendering the results useless in the Python world. + +To fix this problem, we introduce the concept of a trampoline frame. A +trampoline frame is a piece of code that is unique per Python code object that +is executed before entering the CPython eval loop. This piece of code just +calls the original Python evaluation function (_PyEval_EvalFrameDefault) and +forwards all the arguments received. In this way, when a profiler samples +frames from the previous example it will see; + + _PyEval_EvalFrameDefault -----> Evaluation frame of foo() + [Jit compiled code 3] + _PyEval_Vector + _PyFunction_Vectorcall + PyObject_Vectorcall + call_function + + _PyEval_EvalFrameDefault ------> Evaluation frame of bar() + [Jit compiled code 2] + _PyEval_EvalFrame + _PyEval_Vector + _PyFunction_Vectorcall + PyObject_Vectorcall + call_function + + _PyEval_EvalFrameDefault -------> Evaluation frame of baz() + [Jit compiled code 1] + _PyEval_EvalFrame + _PyEval_Vector + _PyFunction_Vectorcall + PyObject_Vectorcall + call_function + + ... + + Py_RunMain + +When we generate every unique copy of the trampoline (what here we called "[Jit +compiled code N]") we write the relationship between the compiled code and the +Python function that is associated with it. Every profiler requires this +information in a different format. For example, the Linux "perf" profiler +requires a file in "/tmp/perf-PID.map" (name and location not configurable) +with the following format: + + <compiled code address> <compiled code size> <name of the compiled code> + +If this file is available when "perf" generates reports, it will automatically +associate every trampoline with the Python function that it is associated with +allowing it to generate reports that include Python information. These reports +then can also be filtered in a way that *only* Python information appears. + +Notice that for this to work, there must be a unique copied of the trampoline +per Python code object even if the code in the trampoline is the same. To +achieve this we have a assembly template in Objects/asm_trampiline.S that is +compiled into the Python executable/shared library. This template generates a +symbol that maps the start of the assembly code and another that marks the end +of the assembly code for the trampoline. Then, every time we need a unique +trampoline for a Python code object, we copy the assembly code into a mmaped +area that has executable permissions and we return the start of that area as +our trampoline function. + +Asking for a mmap-ed memory area for trampoline is very wasteful so we +allocate big arenas of memory in a single mmap call, we populate the entire +arena with copies of the trampoline (this allows us to now have to invalidate +the icache for the instructions in the page) and then we return the next +available chunk every time someone asks for a new trampoline. We keep a linked +list of arenas in case the current memory arena is exhausted and another one is +needed. + +For the best results, Python should be compiled with +CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" as this allows +profilers to unwind using only the frame pointer and not on DWARF debug +information (note that as trampilines are dynamically generated there won't be +any DWARF information available for them). +*/ + +#include "Python.h" +#include "pycore_ceval.h" +#include "pycore_frame.h" +#include "pycore_interp.h" + +typedef enum { + PERF_STATUS_FAILED = -1, // Perf trampoline is in an invalid state + PERF_STATUS_NO_INIT = 0, // Perf trampoline is not initialized + PERF_STATUS_OK = 1, // Perf trampoline is ready to be executed +} perf_status_t; + +#ifdef PY_HAVE_PERF_TRAMPOLINE + +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> +#include <sys/mman.h> +#include <sys/types.h> +#include <unistd.h> + +/* The function pointer is passed as last argument. The other three arguments + * are passed in the same order as the function requires. This results in + * shorter, more efficient ASM code for trampoline. + */ +typedef PyObject *(*py_evaluator)(PyThreadState *, _PyInterpreterFrame *, + int throwflag); +typedef PyObject *(*py_trampoline)(PyThreadState *, _PyInterpreterFrame *, int, + py_evaluator); + +extern void *_Py_trampoline_func_start; // Start of the template of the + // assembly trampoline +extern void * + _Py_trampoline_func_end; // End of the template of the assembly trampoline + +struct code_arena_st { + char *start_addr; // Start of the memory arena + char *current_addr; // Address of the current trampoline within the arena + size_t size; // Size of the memory arena + size_t size_left; // Remaining size of the memory arena + size_t code_size; // Size of the code of every trampoline in the arena + struct code_arena_st + *prev; // Pointer to the arena or NULL if this is the first arena. +}; + +typedef struct code_arena_st code_arena_t; + +struct trampoline_api_st { + void* (*init_state)(void); + void (*write_state)(void* state, const void *code_addr, + unsigned int code_size, PyCodeObject* code); + int (*free_state)(void* state); + void *state; +}; + +typedef struct trampoline_api_st trampoline_api_t; + +static perf_status_t perf_status = PERF_STATUS_NO_INIT; +static Py_ssize_t extra_code_index = -1; +static code_arena_t *code_arena; +static trampoline_api_t trampoline_api; + +static FILE *perf_map_file; + +static void * +perf_map_get_file(void) +{ + if (perf_map_file) { + return perf_map_file; + } + char filename[100]; + pid_t pid = getpid(); + // Location and file name of perf map is hard-coded in perf tool. + // Use exclusive create flag wit nofollow to prevent symlink attacks. + int flags = O_WRONLY | O_CREAT | O_EXCL | O_NOFOLLOW | O_CLOEXEC; + snprintf(filename, sizeof(filename) - 1, "/tmp/perf-%jd.map", + (intmax_t)pid); + int fd = open(filename, flags, 0600); + if (fd == -1) { + perf_status = PERF_STATUS_FAILED; + PyErr_SetFromErrnoWithFilename(PyExc_OSError, filename); + return NULL; + } + perf_map_file = fdopen(fd, "w"); + if (!perf_map_file) { + perf_status = PERF_STATUS_FAILED; + PyErr_SetFromErrnoWithFilename(PyExc_OSError, filename); + close(fd); + return NULL; + } + return perf_map_file; +} + +static int +perf_map_close(void *state) +{ + FILE *fp = (FILE *)state; + int ret = 0; + if (fp) { + ret = fclose(fp); + } + perf_map_file = NULL; + perf_status = PERF_STATUS_NO_INIT; + return ret; +} + +static void +perf_map_write_entry(void *state, const void *code_addr, + unsigned int code_size, PyCodeObject *co) +{ + assert(state != NULL); + FILE *method_file = (FILE *)state; + const char *entry = PyUnicode_AsUTF8(co->co_qualname); + if (entry == NULL) { + _PyErr_WriteUnraisableMsg("Failed to get qualname from code object", + NULL); + return; + } + const char *filename = PyUnicode_AsUTF8(co->co_filename); + if (filename == NULL) { + _PyErr_WriteUnraisableMsg("Failed to get filename from code object", + NULL); + return; + } + fprintf(method_file, "%p %x py::%s:%s\n", code_addr, code_size, entry, + filename); + fflush(method_file); +} + +_PyPerf_Callbacks _Py_perfmap_callbacks = { + &perf_map_get_file, + &perf_map_write_entry, + &perf_map_close +}; + +static int +new_code_arena(void) +{ + // non-trivial programs typically need 64 to 256 kiB. + size_t mem_size = 4096 * 16; + assert(mem_size % sysconf(_SC_PAGESIZE) == 0); + char *memory = + mmap(NULL, // address + mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, + -1, // fd (not used here) + 0); // offset (not used here) + if (!memory) { + PyErr_SetFromErrno(PyExc_OSError); + _PyErr_WriteUnraisableMsg( + "Failed to create new mmap for perf trampoline", NULL); + perf_status = PERF_STATUS_FAILED; + return -1; + } + void *start = &_Py_trampoline_func_start; + void *end = &_Py_trampoline_func_end; + size_t code_size = end - start; + + size_t n_copies = mem_size / code_size; + for (size_t i = 0; i < n_copies; i++) { + memcpy(memory + i * code_size, start, code_size * sizeof(char)); + } + // Some systems may prevent us from creating executable code on the fly. + int res = mprotect(memory, mem_size, PROT_READ | PROT_EXEC); + if (res == -1) { + PyErr_SetFromErrno(PyExc_OSError); + munmap(memory, mem_size); + _PyErr_WriteUnraisableMsg( + "Failed to set mmap for perf trampoline to PROT_READ | PROT_EXEC", + NULL); + return -1; + } + + code_arena_t *new_arena = PyMem_RawCalloc(1, sizeof(code_arena_t)); + if (new_arena == NULL) { + PyErr_NoMemory(); + munmap(memory, mem_size); + _PyErr_WriteUnraisableMsg("Failed to allocate new code arena struct", + NULL); + return -1; + } + + new_arena->start_addr = memory; + new_arena->current_addr = memory; + new_arena->size = mem_size; + new_arena->size_left = mem_size; + new_arena->code_size = code_size; + new_arena->prev = code_arena; + code_arena = new_arena; + return 0; +} + +static void +free_code_arenas(void) +{ + code_arena_t *cur = code_arena; + code_arena_t *prev; + code_arena = NULL; // invalid static pointer + while (cur) { + munmap(cur->start_addr, cur->size); + prev = cur->prev; + PyMem_RawFree(cur); + cur = prev; + } +} + +static inline py_trampoline +code_arena_new_code(code_arena_t *code_arena) +{ + py_trampoline trampoline = (py_trampoline)code_arena->current_addr; + code_arena->size_left -= code_arena->code_size; + code_arena->current_addr += code_arena->code_size; + return trampoline; +} + +static inline py_trampoline +compile_trampoline(void) +{ + if ((code_arena == NULL) || + (code_arena->size_left <= code_arena->code_size)) { + if (new_code_arena() < 0) { + return NULL; + } + } + assert(code_arena->size_left <= code_arena->size); + return code_arena_new_code(code_arena); +} + +static PyObject * +py_trampoline_evaluator(PyThreadState *ts, _PyInterpreterFrame *frame, + int throw) +{ + if (perf_status == PERF_STATUS_FAILED || + perf_status == PERF_STATUS_NO_INIT) { + goto default_eval; + } + PyCodeObject *co = frame->f_code; + py_trampoline f = NULL; + assert(extra_code_index != -1); + int ret = _PyCode_GetExtra((PyObject *)co, extra_code_index, (void **)&f); + if (ret != 0 || f == NULL) { + // This is the first time we see this code object so we need + // to compile a trampoline for it. + py_trampoline new_trampoline = compile_trampoline(); + if (new_trampoline == NULL) { + goto default_eval; + } + trampoline_api.write_state(trampoline_api.state, new_trampoline, + code_arena->code_size, co); + _PyCode_SetExtra((PyObject *)co, extra_code_index, + (void *)new_trampoline); + f = new_trampoline; + } + assert(f != NULL); + return f(ts, frame, throw, _PyEval_EvalFrameDefault); +default_eval: + // Something failed, fall back to the default evaluator. + return _PyEval_EvalFrameDefault(ts, frame, throw); +} +#endif // PY_HAVE_PERF_TRAMPOLINE + +int +_PyIsPerfTrampolineActive(void) +{ +#ifdef PY_HAVE_PERF_TRAMPOLINE + PyThreadState *tstate = _PyThreadState_GET(); + return tstate->interp->eval_frame == py_trampoline_evaluator; +#endif + return 0; +} + +void +_PyPerfTrampoline_GetCallbacks(_PyPerf_Callbacks *callbacks) +{ + if (callbacks == NULL) { + return; + } +#ifdef PY_HAVE_PERF_TRAMPOLINE + callbacks->init_state = trampoline_api.init_state; + callbacks->write_state = trampoline_api.write_state; + callbacks->free_state = trampoline_api.free_state; +#endif + return; +} + +int +_PyPerfTrampoline_SetCallbacks(_PyPerf_Callbacks *callbacks) +{ + if (callbacks == NULL) { + return -1; + } +#ifdef PY_HAVE_PERF_TRAMPOLINE + if (trampoline_api.state) { + _PyPerfTrampoline_Fini(); + } + trampoline_api.init_state = callbacks->init_state; + trampoline_api.write_state = callbacks->write_state; + trampoline_api.free_state = callbacks->free_state; + trampoline_api.state = NULL; + perf_status = PERF_STATUS_OK; +#endif + return 0; +} + +int +_PyPerfTrampoline_Init(int activate) +{ +#ifdef PY_HAVE_PERF_TRAMPOLINE + PyThreadState *tstate = _PyThreadState_GET(); + if (tstate->interp->eval_frame && + tstate->interp->eval_frame != py_trampoline_evaluator) { + PyErr_SetString(PyExc_RuntimeError, + "Trampoline cannot be initialized as a custom eval " + "frame is already present"); + return -1; + } + if (!activate) { + tstate->interp->eval_frame = NULL; + } + else { + tstate->interp->eval_frame = py_trampoline_evaluator; + if (new_code_arena() < 0) { + return -1; + } + if (trampoline_api.state == NULL) { + void *state = trampoline_api.init_state(); + if (state == NULL) { + return -1; + } + trampoline_api.state = state; + } + extra_code_index = _PyEval_RequestCodeExtraIndex(NULL); + if (extra_code_index == -1) { + return -1; + } + perf_status = PERF_STATUS_OK; + } +#endif + return 0; +} + +int +_PyPerfTrampoline_Fini(void) +{ +#ifdef PY_HAVE_PERF_TRAMPOLINE + PyThreadState *tstate = _PyThreadState_GET(); + if (tstate->interp->eval_frame == py_trampoline_evaluator) { + tstate->interp->eval_frame = NULL; + } + free_code_arenas(); + if (trampoline_api.state != NULL) { + trampoline_api.free_state(trampoline_api.state); + trampoline_api.state = NULL; + } + extra_code_index = -1; +#endif + return 0; +} + +PyStatus +_PyPerfTrampoline_AfterFork_Child(void) +{ +#ifdef PY_HAVE_PERF_TRAMPOLINE + // Restart trampoline in file in child. + int was_active = _PyIsPerfTrampolineActive(); + _PyPerfTrampoline_Fini(); + if (was_active) { + _PyPerfTrampoline_Init(1); + } +#endif + return PyStatus_Ok(); +} diff --git a/PCbuild/_freeze_module.vcxproj b/PCbuild/_freeze_module.vcxproj index 4c0072971123..0e446fe46eb0 100644 --- a/PCbuild/_freeze_module.vcxproj +++ b/PCbuild/_freeze_module.vcxproj @@ -129,6 +129,7 @@ <ClCompile Include="..\Objects\cellobject.c" /> <ClCompile Include="..\Objects\classobject.c" /> <ClCompile Include="..\Objects\codeobject.c" /> + <ClCompile Include="..\Objects\perf_trampoline.c" /> <ClCompile Include="..\Objects\complexobject.c" /> <ClCompile Include="..\Objects\descrobject.c" /> <ClCompile Include="..\Objects\dictobject.c" /> diff --git a/PCbuild/_freeze_module.vcxproj.filters b/PCbuild/_freeze_module.vcxproj.filters index 5c984999c0cd..96ab2f2a4aac 100644 --- a/PCbuild/_freeze_module.vcxproj.filters +++ b/PCbuild/_freeze_module.vcxproj.filters @@ -85,6 +85,9 @@ <ClCompile Include="..\Objects\codeobject.c"> <Filter>Source Files</Filter> </ClCompile> + <ClCompile Include="..\Objects\perf_trampoline.c"> + <Filter>Source Files</Filter> + </ClCompile> <ClCompile Include="..\Python\compile.c"> <Filter>Source Files</Filter> </ClCompile> diff --git a/PCbuild/pythoncore.vcxproj b/PCbuild/pythoncore.vcxproj index 45e5013e61f6..ff17304032cd 100644 --- a/PCbuild/pythoncore.vcxproj +++ b/PCbuild/pythoncore.vcxproj @@ -429,6 +429,7 @@ <ClCompile Include="..\Objects\cellobject.c" /> <ClCompile Include="..\Objects\classobject.c" /> <ClCompile Include="..\Objects\codeobject.c" /> + <ClCompile Include="..\Objects\perf_trampoline.c" /> <ClCompile Include="..\Objects\complexobject.c" /> <ClCompile Include="..\Objects\descrobject.c" /> <ClCompile Include="..\Objects\dictobject.c" /> diff --git a/PCbuild/pythoncore.vcxproj.filters b/PCbuild/pythoncore.vcxproj.filters index 581ea6e3c58e..7d7fe7267c8f 100644 --- a/PCbuild/pythoncore.vcxproj.filters +++ b/PCbuild/pythoncore.vcxproj.filters @@ -923,6 +923,9 @@ <ClCompile Include="..\Objects\codeobject.c"> <Filter>Objects</Filter> </ClCompile> + <ClCompile Include="..\Objects\perf_trampoline.c"> + <Filter>Objects</Filter> + </ClCompile> <ClCompile Include="..\Objects\complexobject.c"> <Filter>Objects</Filter> </ClCompile> diff --git a/Python/clinic/sysmodule.c.h b/Python/clinic/sysmodule.c.h index 0f9636690a40..ddf01a7ccdda 100644 --- a/Python/clinic/sysmodule.c.h +++ b/Python/clinic/sysmodule.c.h @@ -1151,6 +1151,79 @@ sys_getandroidapilevel(PyObject *module, PyObject *Py_UNUSED(ignored)) #endif /* defined(ANDROID_API_LEVEL) */ +PyDoc_STRVAR(sys_activate_stack_trampoline__doc__, +"activate_stack_trampoline($module, backend, /)\n" +"--\n" +"\n" +"Activate the perf profiler trampoline."); + +#define SYS_ACTIVATE_STACK_TRAMPOLINE_METHODDEF \ + {"activate_stack_trampoline", (PyCFunction)sys_activate_stack_trampoline, METH_O, sys_activate_stack_trampoline__doc__}, + +static PyObject * +sys_activate_stack_trampoline_impl(PyObject *module, const char *backend); + +static PyObject * +sys_activate_stack_trampoline(PyObject *module, PyObject *arg) +{ + PyObject *return_value = NULL; + const char *backend; + + if (!PyUnicode_Check(arg)) { + _PyArg_BadArgument("activate_stack_trampoline", "argument", "str", arg); + goto exit; + } + Py_ssize_t backend_length; + backend = PyUnicode_AsUTF8AndSize(arg, &backend_length); + if (backend == NULL) { + goto exit; + } + if (strlen(backend) != (size_t)backend_length) { + PyErr_SetString(PyExc_ValueError, "embedded null character"); + goto exit; + } + return_value = sys_activate_stack_trampoline_impl(module, backend); + +exit: + return return_value; +} + +PyDoc_STRVAR(sys_deactivate_stack_trampoline__doc__, +"deactivate_stack_trampoline($module, /)\n" +"--\n" +"\n" +"Dectivate the perf profiler trampoline."); + +#define SYS_DEACTIVATE_STACK_TRAMPOLINE_METHODDEF \ + {"deactivate_stack_trampoline", (PyCFunction)sys_deactivate_stack_trampoline, METH_NOARGS, sys_deactivate_stack_trampoline__doc__}, + +static PyObject * +sys_deactivate_stack_trampoline_impl(PyObject *module); + +static PyObject * +sys_deactivate_stack_trampoline(PyObject *module, PyObject *Py_UNUSED(ignored)) +{ + return sys_deactivate_stack_trampoline_impl(module); +} + +PyDoc_STRVAR(sys_is_stack_trampoline_active__doc__, +"is_stack_trampoline_active($module, /)\n" +"--\n" +"\n" +"Returns *True* if the perf profiler trampoline is active."); + +#define SYS_IS_STACK_TRAMPOLINE_ACTIVE_METHODDEF \ + {"is_stack_trampoline_active", (PyCFunction)sys_is_stack_trampoline_active, METH_NOARGS, sys_is_stack_trampoline_active__doc__}, + +static PyObject * +sys_is_stack_trampoline_active_impl(PyObject *module); + +static PyObject * +sys_is_stack_trampoline_active(PyObject *module, PyObject *Py_UNUSED(ignored)) +{ + return sys_is_stack_trampoline_active_impl(module); +} + #ifndef SYS_GETWINDOWSVERSION_METHODDEF #define SYS_GETWINDOWSVERSION_METHODDEF #endif /* !defined(SYS_GETWINDOWSVERSION_METHODDEF) */ @@ -1194,4 +1267,4 @@ sys_getandroidapilevel(PyObject *module, PyObject *Py_UNUSED(ignored)) #ifndef SYS_GETANDROIDAPILEVEL_METHODDEF #define SYS_GETANDROIDAPILEVEL_METHODDEF #endif /* !defined(SYS_GETANDROIDAPILEVEL_METHODDEF) */ -/*[clinic end generated code: output=322fb0409e376ad4 input=a9049054013a1b77]*/ +/*[clinic end generated code: output=43b44240211afe95 input=a9049054013a1b77]*/ diff --git a/Python/initconfig.c b/Python/initconfig.c index 70f0363297f3..33a8f276b19c 100644 --- a/Python/initconfig.c +++ b/Python/initconfig.c @@ -118,6 +118,11 @@ The following implementation-specific options are available:\n\ files are desired as well as suppressing the extra visual location indicators \n\ when the interpreter displays tracebacks.\n\ \n\ +-X perf: activate support for the Linux \"perf\" profiler by activating the \"perf\"\n\ + trampoline. When this option is activated, the Linux \"perf\" profiler will be \n\ + able to report Python calls. This option is only available on some platforms and will \n\ + do nothing if is not supported on the current system. The default value is \"off\".\n\ +\n\ -X frozen_modules=[on|off]: whether or not frozen modules should be used.\n\ The default is \"on\" (or \"off\" if you are running a local build)."; @@ -745,6 +750,7 @@ _PyConfig_InitCompatConfig(PyConfig *config) config->use_hash_seed = -1; config->faulthandler = -1; config->tracemalloc = -1; + config->perf_profiling = -1; config->module_search_paths_set = 0; config->parse_argv = 0; config->site_import = -1; @@ -829,6 +835,7 @@ PyConfig_InitIsolatedConfig(PyConfig *config) config->use_hash_seed = 0; config->faulthandler = 0; config->tracemalloc = 0; + config->perf_profiling = 0; config->safe_path = 1; config->pathconfig_warnings = 0; #ifdef MS_WINDOWS @@ -940,6 +947,7 @@ _PyConfig_Copy(PyConfig *config, const PyConfig *config2) COPY_ATTR(_install_importlib); COPY_ATTR(faulthandler); COPY_ATTR(tracemalloc); + COPY_ATTR(perf_profiling); COPY_ATTR(import_time); COPY_ATTR(code_debug_ranges); COPY_ATTR(show_ref_count); @@ -1050,6 +1058,7 @@ _PyConfig_AsDict(const PyConfig *config) SET_ITEM_UINT(hash_seed); SET_ITEM_INT(faulthandler); SET_ITEM_INT(tracemalloc); + SET_ITEM_INT(perf_profiling); SET_ITEM_INT(import_time); SET_ITEM_INT(code_debug_ranges); SET_ITEM_INT(show_ref_count); @@ -1331,6 +1340,7 @@ _PyConfig_FromDict(PyConfig *config, PyObject *dict) CHECK_VALUE("hash_seed", config->hash_seed <= MAX_HASH_SEED); GET_UINT(faulthandler); GET_UINT(tracemalloc); + GET_UINT(perf_profiling); GET_UINT(import_time); GET_UINT(code_debug_ranges); GET_UINT(show_ref_count); @@ -1687,6 +1697,26 @@ config_read_env_vars(PyConfig *config) return _PyStatus_OK(); } +static PyStatus +config_init_perf_profiling(PyConfig *config) +{ + int active = 0; + const char *env = config_get_env(config, "PYTHONPERFSUPPORT"); + if (env) { + if (_Py_str_to_int(env, &active) != 0) { + active = 0; + } + if (active) { + config->perf_profiling = 1; + } + } + const wchar_t *xoption = config_get_xoption(config, L"perf"); + if (xoption) { + config->perf_profiling = 1; + } + return _PyStatus_OK(); + +} static PyStatus config_init_tracemalloc(PyConfig *config) @@ -1788,6 +1818,12 @@ config_read_complex_options(PyConfig *config) return status; } } + if (config->perf_profiling < 0) { + status = config_init_perf_profiling(config); + if (_PyStatus_EXCEPTION(status)) { + return status; + } + } if (config->pycache_prefix == NULL) { status = config_init_pycache_prefix(config); @@ -2104,6 +2140,9 @@ config_read(PyConfig *config, int compute_path_config) if (config->tracemalloc < 0) { config->tracemalloc = 0; } + if (config->perf_profiling < 0) { + config->perf_profiling = 0; + } if (config->use_hash_seed < 0) { config->use_hash_seed = 0; config->hash_seed = 0; diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c index bb646f1a0ee2..8ce6d71651c1 100644 --- a/Python/pylifecycle.c +++ b/Python/pylifecycle.c @@ -1149,6 +1149,16 @@ init_interp_main(PyThreadState *tstate) if (_PyTraceMalloc_Init(config->tracemalloc) < 0) { return _PyStatus_ERR("can't initialize tracemalloc"); } + + +#ifdef PY_HAVE_PERF_TRAMPOLINE + if (config->perf_profiling) { + if (_PyPerfTrampoline_SetCallbacks(&_Py_perfmap_callbacks) < 0 || + _PyPerfTrampoline_Init(config->perf_profiling) < 0) { + return _PyStatus_ERR("can't initialize the perf trampoline"); + } + } +#endif } status = init_sys_streams(tstate); @@ -1723,6 +1733,7 @@ finalize_interp_clear(PyThreadState *tstate) _PyArg_Fini(); _Py_ClearFileSystemEncoding(); _Py_Deepfreeze_Fini(); + _PyPerfTrampoline_Fini(); } finalize_interp_types(tstate->interp); diff --git a/Python/sysmodule.c b/Python/sysmodule.c index c28643879416..75e64553d88c 100644 --- a/Python/sysmodule.c +++ b/Python/sysmodule.c @@ -2053,6 +2053,80 @@ sys_getandroidapilevel_impl(PyObject *module) } #endif /* ANDROID_API_LEVEL */ +/*[clinic input] +sys.activate_stack_trampoline + + backend: str + / + +Activate the perf profiler trampoline. +[clinic start generated code]*/ + +static PyObject * +sys_activate_stack_trampoline_impl(PyObject *module, const char *backend) +/*[clinic end generated code: output=5783cdeb51874b43 input=b09020e3a17c78c5]*/ +{ +#ifdef PY_HAVE_PERF_TRAMPOLINE + if (strcmp(backend, "perf") == 0) { + _PyPerf_Callbacks cur_cb; + _PyPerfTrampoline_GetCallbacks(&cur_cb); + if (cur_cb.init_state != _Py_perfmap_callbacks.init_state) { + if (_PyPerfTrampoline_SetCallbacks(&_Py_perfmap_callbacks) < 0 ) { + PyErr_SetString(PyExc_ValueError, "can't activate perf trampoline"); + return NULL; + } + } + } + else { + PyErr_Format(PyExc_ValueError, "invalid backend: %s", backend); + return NULL; + } + if (_PyPerfTrampoline_Init(1) < 0) { + return NULL; + } + Py_RETURN_NONE; +#else + PyErr_SetString(PyExc_ValueError, "perf trampoline not available"); + return NULL; +#endif +} + + +/*[clinic input] +sys.deactivate_stack_trampoline + +Dectivate the perf profiler trampoline. +[clinic start generated code]*/ + +static PyObject * +sys_deactivate_stack_trampoline_impl(PyObject *module) +/*[clinic end generated code: output=b50da25465df0ef1 input=491f4fc1ed615736]*/ +{ + if (_PyPerfTrampoline_Init(0) < 0) { + return NULL; + } + Py_RETURN_NONE; +} + +/*[clinic input] +sys.is_stack_trampoline_active + +Returns *True* if the perf profiler trampoline is active. +[clinic start generated code]*/ + +static PyObject * +sys_is_stack_trampoline_active_impl(PyObject *module) +/*[clinic end generated code: output=ab2746de0ad9d293 input=061fa5776ac9dd59]*/ +{ +#ifdef PY_HAVE_PERF_TRAMPOLINE + if (_PyIsPerfTrampolineActive()) { + Py_RETURN_TRUE; + } +#endif + Py_RETURN_FALSE; +} + + static PyMethodDef sys_methods[] = { /* Might as well keep this in alphabetic order */ SYS_ADDAUDITHOOK_METHODDEF @@ -2108,6 +2182,9 @@ static PyMethodDef sys_methods[] = { METH_VARARGS | METH_KEYWORDS, set_asyncgen_hooks_doc}, SYS_GET_ASYNCGEN_HOOKS_METHODDEF SYS_GETANDROIDAPILEVEL_METHODDEF + SYS_ACTIVATE_STACK_TRAMPOLINE_METHODDEF + SYS_DEACTIVATE_STACK_TRAMPOLINE_METHODDEF + SYS_IS_STACK_TRAMPOLINE_ACTIVE_METHODDEF SYS_UNRAISABLEHOOK_METHODDEF #ifdef Py_STATS SYS__STATS_ON_METHODDEF diff --git a/configure b/configure index fc7f7fadedb8..9522977c8c70 100755 --- a/configure +++ b/configure @@ -861,6 +861,7 @@ LIBEXPAT_CFLAGS TZPATH LIBUUID_LIBS LIBUUID_CFLAGS +PERF_TRAMPOLINE_OBJ SHLIBS CFLAGSFORSHARED LINKFORSHARED @@ -11498,6 +11499,35 @@ esac { $as_echo "$as_me:${as_lineno-$LINENO}: result: $SHLIBS" >&5 $as_echo "$SHLIBS" >&6; } +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking perf trampoline" >&5 +$as_echo_n "checking perf trampoline... " >&6; } +case $PLATFORM_TRIPLET in #( + x86_64-linux-gnu) : + perf_trampoline=yes ;; #( + aarch64-linux-gnu) : + perf_trampoline=yes ;; #( + *) : + perf_trampoline=no + ;; +esac +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $perf_trampoline" >&5 +$as_echo "$perf_trampoline" >&6; } + +if test "x$perf_trampoline" = xyes; then : + + +$as_echo "#define PY_HAVE_PERF_TRAMPOLINE 1" >>confdefs.h + + PERF_TRAMPOLINE_OBJ=Objects/asm_trampoline.o + + if test "x$Py_DEBUG" = xtrue; then : + + as_fn_append BASECFLAGS " -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" + +fi + +fi + # checks for libraries { $as_echo "$as_me:${as_lineno-$LINENO}: checking for sendfile in -lsendfile" >&5 diff --git a/configure.ac b/configure.ac index 2b927cd961fa..3a009bb5042a 100644 --- a/configure.ac +++ b/configure.ac @@ -3452,6 +3452,26 @@ case "$ac_sys_system" in esac AC_MSG_RESULT($SHLIBS) +dnl perf trampoline is Linux specific and requires an arch-specific +dnl trampoline in asssembly. +AC_MSG_CHECKING([perf trampoline]) +AS_CASE([$PLATFORM_TRIPLET], + [x86_64-linux-gnu], [perf_trampoline=yes], + [aarch64-linux-gnu], [perf_trampoline=yes], + [perf_trampoline=no] +) +AC_MSG_RESULT([$perf_trampoline]) + +AS_VAR_IF([perf_trampoline], [yes], [ + AC_DEFINE([PY_HAVE_PERF_TRAMPOLINE], [1], [Define to 1 if you have the perf trampoline.]) + PERF_TRAMPOLINE_OBJ=Objects/asm_trampoline.o + + dnl perf needs frame pointers for unwinding, include compiler option in debug builds + AS_VAR_IF([Py_DEBUG], [true], [ + AS_VAR_APPEND([BASECFLAGS], [" -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"]) + ]) +]) +AC_SUBST([PERF_TRAMPOLINE_OBJ]) # checks for libraries AC_CHECK_LIB(sendfile, sendfile) diff --git a/pyconfig.h.in b/pyconfig.h.in index 10e7ad12fa98..1ce09855f555 100644 --- a/pyconfig.h.in +++ b/pyconfig.h.in @@ -1568,6 +1568,9 @@ /* Define if you want to coerce the C locale to a UTF-8 based locale */ #undef PY_COERCE_C_LOCALE +/* Define to 1 if you have the perf trampoline. */ +#undef PY_HAVE_PERF_TRAMPOLINE + /* Define to 1 to build the sqlite module with loadable extensions support. */ #undef PY_SQLITE_ENABLE_LOAD_EXTENSION

1 0

GH-95245: Document use of `MANAGED` flags instead of offsets. (GH-96044)
by markshannon Aug. 30, 2022

Aug. 30, 2022

https://github.com/python/cpython/commit/0f733fffe8f4caaac3ce1b5306af86b42f… commit: 0f733fffe8f4caaac3ce1b5306af86b42fb0c7fa branch: main author: Mark Shannon <mark(a)hotpy.org> committer: markshannon <mark(a)hotpy.org> date: 2022-08-30T16:26:08+01:00 summary: GH-95245: Document use of `MANAGED` flags instead of offsets. (GH-96044) files: M Doc/c-api/structures.rst M Doc/c-api/type.rst M Doc/c-api/typeobj.rst M Doc/extending/newtypes.rst diff --git a/Doc/c-api/structures.rst b/Doc/c-api/structures.rst index 86d4536f8d28..f1eb09bb5691 100644 --- a/Doc/c-api/structures.rst +++ b/Doc/c-api/structures.rst @@ -469,19 +469,21 @@ Accessing attributes of extension types .. _pymemberdef-offsets: Heap allocated types (created using :c:func:`PyType_FromSpec` or similar), - ``PyMemberDef`` may contain definitions for the special members - ``__dictoffset__``, ``__weaklistoffset__`` and ``__vectorcalloffset__``, - corresponding to - :c:member:`~PyTypeObject.tp_dictoffset`, - :c:member:`~PyTypeObject.tp_weaklistoffset` and + ``PyMemberDef`` may contain definitions for the special member + ``__vectorcalloffset__``, corresponding to :c:member:`~PyTypeObject.tp_vectorcall_offset` in type objects. These must be defined with ``T_PYSSIZET`` and ``READONLY``, for example:: static PyMemberDef spam_type_members[] = { - {"__dictoffset__", T_PYSSIZET, offsetof(Spam_object, dict), READONLY}, + {"__vectorcalloffset__", T_PYSSIZET, offsetof(Spam_object, vectorcall), READONLY}, {NULL} /* Sentinel */ }; + The legacy offsets :c:member:`~PyTypeObject.tp_dictoffset` and + :c:member:`~PyTypeObject.tp_weaklistoffset` are still supported, but extensions are + strongly encouraged to use ``Py_TPFLAGS_MANAGED_DICT`` and + ``Py_TPFLAGS_MANAGED_WEAKREF`` instead. + .. c:function:: PyObject* PyMember_GetOne(const char *obj_addr, struct PyMemberDef *m) diff --git a/Doc/c-api/type.rst b/Doc/c-api/type.rst index aa77c285e3b8..deb5502a4dff 100644 --- a/Doc/c-api/type.rst +++ b/Doc/c-api/type.rst @@ -327,9 +327,9 @@ The following functions and structs are used to create * :c:member:`~PyTypeObject.tp_weaklist` * :c:member:`~PyTypeObject.tp_vectorcall` * :c:member:`~PyTypeObject.tp_weaklistoffset` - (see :ref:`PyMemberDef <pymemberdef-offsets>`) + (use :const:`Py_TPFLAGS_MANAGED_WEAKREF` instead) * :c:member:`~PyTypeObject.tp_dictoffset` - (see :ref:`PyMemberDef <pymemberdef-offsets>`) + (use :const:`Py_TPFLAGS_MANAGED_DICT` instead) * :c:member:`~PyTypeObject.tp_vectorcall_offset` (see :ref:`PyMemberDef <pymemberdef-offsets>`) diff --git a/Doc/c-api/typeobj.rst b/Doc/c-api/typeobj.rst index 20cdffed110c..2439f7c41b56 100644 --- a/Doc/c-api/typeobj.rst +++ b/Doc/c-api/typeobj.rst @@ -96,7 +96,7 @@ Quick Reference | | | __gt__, | | | | | | | | __ge__ | | | | | +------------------------------------------------+-----------------------------------+-------------------+---+---+---+---+ - | :c:member:`~PyTypeObject.tp_weaklistoffset` | :c:type:`Py_ssize_t` | | | X | | ? | + | (:c:member:`~PyTypeObject.tp_weaklistoffset`) | :c:type:`Py_ssize_t` | | | X | | ? | +------------------------------------------------+-----------------------------------+-------------------+---+---+---+---+ | :c:member:`~PyTypeObject.tp_iter` | :c:type:`getiterfunc` | __iter__ | | | | X | +------------------------------------------------+-----------------------------------+-------------------+---+---+---+---+ @@ -117,7 +117,7 @@ Quick Reference | :c:member:`~PyTypeObject.tp_descr_set` | :c:type:`descrsetfunc` | __set__, | | | | X | | | | __delete__ | | | | | +------------------------------------------------+-----------------------------------+-------------------+---+---+---+---+ - | :c:member:`~PyTypeObject.tp_dictoffset` | :c:type:`Py_ssize_t` | | | X | | ? | + | (:c:member:`~PyTypeObject.tp_dictoffset`) | :c:type:`Py_ssize_t` | | | X | | ? | +------------------------------------------------+-----------------------------------+-------------------+---+---+---+---+ | :c:member:`~PyTypeObject.tp_init` | :c:type:`initproc` | __init__ | X | X | | X | +------------------------------------------------+-----------------------------------+-------------------+---+---+---+---+ @@ -1018,7 +1018,6 @@ and :c:type:`PyType_Type` effectively act as defaults.) :const:`Py_TPFLAGS_HAVE_GC` flag bit is clear in the subtype and the :c:member:`~PyTypeObject.tp_traverse` and :c:member:`~PyTypeObject.tp_clear` fields in the subtype exist and have ``NULL`` values. - .. XXX are most flag bits *really* inherited individually? **Default:** @@ -1135,6 +1134,33 @@ and :c:type:`PyType_Type` effectively act as defaults.) :const:`Py_TPFLAGS_IMMUTABLETYPE` flag set. For extension types, it is inherited whenever :c:member:`~PyTypeObject.tp_descr_get` is inherited. + .. data:: Py_TPFLAGS_MANAGED_DICT + + This bit indicates that instances of the class have a ``__dict___`` + attribute, and that the space for the dictionary is managed by the VM. + + If this flag is set, :const:`Py_TPFLAGS_HAVE_GC` should also be set. + + .. versionadded:: 3.12 + + **Inheritance:** + + This flag is inherited unless the + :c:member:`~PyTypeObject.tp_dictoffset` field is set in a superclass. + + + .. data:: Py_TPFLAGS_MANAGED_WEAKREF + + This bit indicates that instances of the class should be weakly + referenceable. + + .. versionadded:: 3.12 + + **Inheritance:** + + This flag is inherited unless the + :c:member:`~PyTypeObject.tp_weaklistoffset` field is set in a superclass. + .. XXX Document more flags here? @@ -1487,6 +1513,9 @@ and :c:type:`PyType_Type` effectively act as defaults.) .. c:member:: Py_ssize_t PyTypeObject.tp_weaklistoffset + While this field is still supported, :const:`Py_TPFLAGS_MANAGED_WEAKREF` + should be used instead, if at all possible. + If the instances of this type are weakly referenceable, this field is greater than zero and contains the offset in the instance structure of the weak reference list head (ignoring the GC header, if present); this offset is used by @@ -1497,6 +1526,9 @@ and :c:type:`PyType_Type` effectively act as defaults.) Do not confuse this field with :c:member:`~PyTypeObject.tp_weaklist`; that is the list head for weak references to the type object itself. + It is an error to set both the :const:`Py_TPFLAGS_MANAGED_WEAKREF` bit and + :c:member:`~PyTypeObject.tp_weaklist`. + **Inheritance:** This field is inherited by subtypes, but see the rules listed below. A subtype @@ -1504,19 +1536,12 @@ and :c:type:`PyType_Type` effectively act as defaults.) reference list head than the base type. Since the list head is always found via :c:member:`~PyTypeObject.tp_weaklistoffset`, this should not be a problem. - When a type defined by a class statement has no :attr:`~object.__slots__` declaration, - and none of its base types are weakly referenceable, the type is made weakly - referenceable by adding a weak reference list head slot to the instance layout - and setting the :c:member:`~PyTypeObject.tp_weaklistoffset` of that slot's offset. - - When a type's :attr:`__slots__` declaration contains a slot named - :attr:`__weakref__`, that slot becomes the weak reference list head for - instances of the type, and the slot's offset is stored in the type's - :c:member:`~PyTypeObject.tp_weaklistoffset`. + **Default:** - When a type's :attr:`__slots__` declaration does not contain a slot named - :attr:`__weakref__`, the type inherits its :c:member:`~PyTypeObject.tp_weaklistoffset` from its - base type. + If the :const:`Py_TPFLAGS_MANAGED_WEAKREF` bit is set in the + :c:member:`~PyTypeObject.tp_dict` field, then + :c:member:`~PyTypeObject.tp_weaklistoffset` will be set to a negative value, + to indicate that it is unsafe to use this field. .. c:member:: getiterfunc PyTypeObject.tp_iter @@ -1695,6 +1720,9 @@ and :c:type:`PyType_Type` effectively act as defaults.) .. c:member:: Py_ssize_t PyTypeObject.tp_dictoffset + While this field is still supported, :const:`Py_TPFLAGS_MANAGED_DICT` should be + used instead, if at all possible. + If the instances of this type have a dictionary containing instance variables, this field is non-zero and contains the offset in the instances of the type of the instance variable dictionary; this offset is used by @@ -1703,17 +1731,7 @@ and :c:type:`PyType_Type` effectively act as defaults.) Do not confuse this field with :c:member:`~PyTypeObject.tp_dict`; that is the dictionary for attributes of the type object itself. - If the value of this field is greater than zero, it specifies the offset from - the start of the instance structure. If the value is less than zero, it - specifies the offset from the *end* of the instance structure. A negative - offset is more expensive to use, and should only be used when the instance - structure contains a variable-length part. This is used for example to add an - instance variable dictionary to subtypes of :class:`str` or :class:`tuple`. Note - that the :c:member:`~PyTypeObject.tp_basicsize` field should account for the dictionary added to - the end in that case, even though the dictionary is not included in the basic - object layout. On a system with a pointer size of 4 bytes, - :c:member:`~PyTypeObject.tp_dictoffset` should be set to ``-4`` to indicate that the dictionary is - at the very end of the structure. + The value specifies the offset of the dictionary from the start of the instance structure. The :c:member:`~PyTypeObject.tp_dictoffset` should be regarded as write-only. To get the pointer to the dictionary call :c:func:`PyObject_GenericGetDict`. @@ -1721,30 +1739,26 @@ and :c:type:`PyType_Type` effectively act as defaults.) dictionary, so it is may be more efficient to call :c:func:`PyObject_GetAttr` when accessing an attribute on the object. - **Inheritance:** - - This field is inherited by subtypes, but see the rules listed below. A subtype - may override this offset; this means that the subtype instances store the - dictionary at a difference offset than the base type. Since the dictionary is - always found via :c:member:`~PyTypeObject.tp_dictoffset`, this should not be a problem. + It is an error to set both the :const:`Py_TPFLAGS_MANAGED_WEAKREF` bit and + :c:member:`~PyTypeObject.tp_dictoffset`. - When a type defined by a class statement has no :attr:`~object.__slots__` declaration, - and none of its base types has an instance variable dictionary, a dictionary - slot is added to the instance layout and the :c:member:`~PyTypeObject.tp_dictoffset` is set to - that slot's offset. - - When a type defined by a class statement has a :attr:`__slots__` declaration, - the type inherits its :c:member:`~PyTypeObject.tp_dictoffset` from its base type. + **Inheritance:** - (Adding a slot named :attr:`~object.__dict__` to the :attr:`__slots__` declaration does - not have the expected effect, it just causes confusion. Maybe this should be - added as a feature just like :attr:`__weakref__` though.) + This field is inherited by subtypes. A subtype should not override this offset; + doing so could be unsafe, if C code tries to access the dictionary at the + previous offset. + To properly support inheritance, use :const:`Py_TPFLAGS_MANAGED_DICT`. **Default:** This slot has no default. For :ref:`static types <static-types>`, if the field is ``NULL`` then no :attr:`__dict__` gets created for instances. + If the :const:`Py_TPFLAGS_MANAGED_DICT` bit is set in the + :c:member:`~PyTypeObject.tp_dict` field, then + :c:member:`~PyTypeObject.tp_dictoffset` will be set to ``-1``, to indicate + that it is unsafe to use this field. + .. c:member:: initproc PyTypeObject.tp_init @@ -2663,8 +2677,6 @@ A type that supports weakrefs, instance dicts, and hashing:: typedef struct { PyObject_HEAD const char *data; - PyObject *inst_dict; - PyObject *weakreflist; } MyObject; static PyTypeObject MyObject_Type = { @@ -2672,9 +2684,9 @@ A type that supports weakrefs, instance dicts, and hashing:: .tp_name = "mymod.MyObject", .tp_basicsize = sizeof(MyObject), .tp_doc = PyDoc_STR("My objects"), - .tp_weaklistoffset = offsetof(MyObject, weakreflist), - .tp_dictoffset = offsetof(MyObject, inst_dict), - .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, + .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | + Py_TPFLAGS_HAVE_GC | Py_TPFLAGS_MANAGED_DICT | + Py_TPFLAGS_MANAGED_WEAKREF, .tp_new = myobj_new, .tp_traverse = (traverseproc)myobj_traverse, .tp_clear = (inquiry)myobj_clear, diff --git a/Doc/extending/newtypes.rst b/Doc/extending/newtypes.rst index c7c434e58bf0..b797dc2817c8 100644 --- a/Doc/extending/newtypes.rst +++ b/Doc/extending/newtypes.rst @@ -570,43 +570,28 @@ performance-critical objects (such as numbers). .. seealso:: Documentation for the :mod:`weakref` module. -For an object to be weakly referencable, the extension type must do two things: +For an object to be weakly referencable, the extension type must set the +``Py_TPFLAGS_MANAGED_WEAKREF`` bit of the :c:member:`~PyTypeObject.tp_flags` +field. The legacy :c:member:`~PyTypeObject.tp_weaklistoffset` field should +be left as zero. -#. Include a :c:type:`PyObject\*` field in the C object structure dedicated to - the weak reference mechanism. The object's constructor should leave it - ``NULL`` (which is automatic when using the default - :c:member:`~PyTypeObject.tp_alloc`). - -#. Set the :c:member:`~PyTypeObject.tp_weaklistoffset` type member - to the offset of the aforementioned field in the C object structure, - so that the interpreter knows how to access and modify that field. - -Concretely, here is how a trivial object structure would be augmented -with the required field:: - - typedef struct { - PyObject_HEAD - PyObject *weakreflist; /* List of weak references */ - } TrivialObject; - -And the corresponding member in the statically declared type object:: +Concretely, here is how the statically declared type object would look:: static PyTypeObject TrivialType = { PyVarObject_HEAD_INIT(NULL, 0) /* ... other members omitted for brevity ... */ - .tp_weaklistoffset = offsetof(TrivialObject, weakreflist), + .tp_flags = Py_TPFLAGS_MANAGED_WEAKREF | ..., }; + The only further addition is that ``tp_dealloc`` needs to clear any weak -references (by calling :c:func:`PyObject_ClearWeakRefs`) if the field is -non-``NULL``:: +references (by calling :c:func:`PyObject_ClearWeakRefs`):: static void Trivial_dealloc(TrivialObject *self) { /* Clear weakrefs first before calling any destructors */ - if (self->weakreflist != NULL) - PyObject_ClearWeakRefs((PyObject *) self); + PyObject_ClearWeakRefs((PyObject *) self); /* ... remainder of destruction code omitted for brevity ... */ Py_TYPE(self)->tp_free((PyObject *) self); }

1 0