Mailman 3 May 2020 - Python-Dev

Detect memory leaks in unit tests
by Giampaolo Rodola' May 13, 2020

May 13, 2020

Hello there, I would like to discuss a proposal regarding one aspect which AFAIK is currently missing from cPython's test suite: the ability to detect memory leaks of functions implemented in the C extension modules. In psutil I use a test class/framework which calls a function many times, and fails if the process memory increased after doing so. I do this in order to quickly detect missing free() or Py_DECREF calls in the C code, but I suppose there may be other use cases. Here's the class: … [View More]https://github.com/giampaolo/psutil/blob/913d4b1d6dcce88dea6ef9382b93883a04… Detecting a memory leak is no easy task, and that's because the process memory fluctuates. Sometimes it may increase (or even decrease!) even if there's no leak, I suppose because of how the OS handles memory, the Python's garbage collector, the fact that RSS is an approximation, and who knows what else. In order to compensate fluctuations I did the following: in case of failure (mem > 0 after calling fun() N times) I retry the test for up to 5 times, increasing N (repetitions) each time, so I consider the test a failure only if the memory keeps increasing across all runs. So for instance, here's a legitimate failure: psutil.tests.test_memory_leaks.TestModuleFunctionsLeaks.test_disk_partitions ... Run #1: extra-mem=696.0K, per-call=3.5K, calls=200 Run #2: extra-mem=1.4M, per-call=3.5K, calls=400 Run #3: extra-mem=2.1M, per-call=3.5K, calls=600 Run #4: extra-mem=2.7M, per-call=3.5K, calls=800 Run #5: extra-mem=3.4M, per-call=3.5K, calls=1000 FAIL If, on the other hand, the memory increased on one run (say 200 calls) but decreased on the next run (say 400 calls), then it clearly means it's a false positive, because memory consumption may be > 0 on the second run, but if it's lower than the previous run with less repetitions, then it cannot possibly represent a leak (just a fluctuation): psutil.tests.test_memory_leaks.TestModuleFunctionsLeaks.test_net_connections ... Run #1: extra-mem=568.0K, per-call=2.8K, calls=200 Run #2: extra-mem=24.0K, per-call=61.4B, calls=400 OK This is the best I could come up with as a simple leak detection mechanism to integrate with CI services, and keep more advanced tools like Valgrind out of the picture (I just wanted to know if there's a leak, not to debug the leak itself). In addition, since psutil is able to get the number of fds (UNIX) and handles (Windows) opened by a process, I also run a separate set of tests to make sure I didn't forget to call close(2) or CloseHandle() in C. Would something like this make sense to have in cPython? Here's a quick PoC I put together just to show how this thing would look like in practice: https://github.com/giampaolo/cpython/pull/2/files A proper work in terms of API coverage would result being quite huge (test all C modules), and ideally should also include cases where functions raise an exception when being fed with an improper input. The biggest stopper here is, of course, psutil, since it's a third party dep, but before getting to that I wanted to see how this idea is perceived in general. Cheers, -- Giampaolo - http://grodola.blogspot.com [View Less]

5 7

inspect.getdoc and (Not) returning type/superclass docstrings in 3.9
by Matthias Bussonnier May 12, 2020

May 12, 2020

Hi All, # Too long didn't read: In 3.9 inspect.getdoc(instance) behavior was changed and does not return the documentation of type(instance) or it's superclass(es) – I think this is a problematic change of for some project and interactive use to get info on objects that get rarely directly constructed by users. For example pandas dataframe obtained via `pandas.read_csv(filepath)`. I'd like to ask for reconsideration, and that change of behavior are better suited in a new function; … [View More]potentially deprecating the old one. # Longer version In https://bugs.python.org/issue40257 attempts are made to improve the output of `pydoc`, it particular it is difficult to have fine grained logic depending on where the documentation comes from (instance, class , superclass, etc..). Which sometime can lead to nonsensical help. The following are given as examples: > inspect.getdoc(1) returns the same as inspect.getdoc(int) or >>> import wave >>> help(wave.Error) Help on class Error in module wave: class Error(builtins.Exception) | Common base class for all non-exit exceptions. | | Method resolution order: ... In 3.9 the behavior of `inspect.getdoc()` has been changed to be way more restrictive in what it returns, often returning None where it used to return docstrings. I agree with the end goal of having more controllable way of finding where the documentation/docstrings is coming from and avoiding incorrect docs in pydoc and help, though I find that change of behavior of `getdocs()` might not be the right approach. I'm quite worried many project rely on current behavior of `getdocs()` – at least IPython/Jupyter does to provide user with help/superhelp accessible via obj? and obj??. I would also argue that inaccurate help is also often better than no help. With current state on Python 3.9, a few things like asking for help on a pandas dataframe instance will currently loose informations. >>> import pandas as pd >>> from inspect import getdoc >>> df = pd.read_csv('mydata.csv') >>> print(getdoc(df)) None I'm taking the example of pandas as this is typically the kind of objects you don't construct directly, and get via for example `read_csv()`, or that another API/Package return to you. I haven't been able to confirm yet exactly how this affects sphinx rendering of docs, and how other IDEs provide help (Spyder, Pycharm...), or other projects that use `getdocs()`. I've found mentions of `getdocs()` in numpy, scipy, jedi, matplotlib ... as well (sphinx extension and various dynamic docs), and working on building them on 3.9 to check the effect. In general though the effect of `getdoc()` rarely seem to be tested as they will directly be user facing is my feeling – I was lucky to catch it in IPython/Jupyter as the failing test was unrelated and indirectly relying on the exact output of a subprocess. From the IPython/Jupyter perspective I would prefer to keep current behavior of `inspect.getdocs()` potentially deprecating it if you wish to, and provide an alternative that have a behavior of your choosing. Dealing with functions with slightly chaging behavior across Python version is not the best experience, and this would let the ecosystem get some chance to adapt. Updated project get rarely released in synchrony with new Python versions. Your thoughts on this issue are welcome, thanks for all your work on core python, and I'll support any decision that get made. -- Matthias [View Less]

1 0

Status of PEP 543 in 2020
by Nimish Telang May 10, 2020

May 10, 2020

Hi, PEP 543, the new TLS api for Python, was published several years ago as a way to a new library unencumbered by the legacy issues around the current ssl library. In the meantime, no actual implementation has appeared. The closest appears to be https://github.com/Synss/python-mbedtls/tree/0.13.0 or https://github.com/python-hyper/pep543 both of which haven't seen much development in years not to mention that neither one has been accepted into the stdlib. The current ssl library could use … [View More]

1 0

PEP 615 (zoneinfo) implementation ready for review
by Paul Ganssle May 8, 2020

May 8, 2020

Hey all, The feature freeze is coming up on us fast, and the PEP 615 implementation is more or less ready to be integrated into the standard library (may need one or two little tweaks, but it's well past the "minimum viable product" stage). Normally I'd wait longer for someone to volunteer for the task of reviewing, but given the somewhat tight timeline and the fact that the code and tests alone (not including the documentation) are 6000 lines, I figured it's better to give people a head's up … [View More]

1 0

Summary of Python tracker Issues
by Python tracker May 8, 2020

May 8, 2020

ACTIVITY SUMMARY (2020-05-01 - 2020-05-08) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 7459 (+32) closed 44857 (+64) total 52316 (+96) Open issues with patches: 2996 Issues opened (66) ================== #36543: Remove old-deprecated ElementTree features (part 2) https://bugs.python.org/issue36543 reopened by scoder #40028: Math module method … [View More]

1 0

PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround
by Victor Stinner May 8, 2020

May 8, 2020

Hi, I wrote a "per-interpreter GIL" proof-of-concept: each interpreter gets its own GIL. I chose to benchmark a factorial function in pure Python to simulate a CPU-bound workload. I wrote the simplest possible function just to be able to run a benchmark, to check if the PEP 554 would be relevant. The proof-of-concept proves that subinterpreters can make a CPU-bound workload faster than sequential execution or threads and that they have the same speed than multiprocessing. The performance … [View More]scales well with the number of CPUs. Performance =========== Factorial: n = 50_000 fact = 1 for i in range(1, n + 1): fact = fact * i 2 CPUs: Sequential: 1.00 sec +- 0.01 sec Threads: 1.08 sec +- 0.01 sec Multiprocessing: 529 ms +- 6 ms Subinterpreters: 553 ms +- 6 ms 4 CPUs: Sequential: 1.99 sec +- 0.01 sec Threads: 3.15 sec +- 0.97 sec Multiprocessing: 560 ms +- 12 ms Subinterpreters: 583 ms +- 7 ms 8 CPUs: Sequential: 4.01 sec +- 0.02 sec Threads: 9.91 sec +- 0.54 sec Multiprocessing: 1.02 sec +- 0.01 sec Subinterpreters: 1.10 sec +- 0.00 sec Benchmarks run on my laptop which has 8 logical CPUs (4 physical CPU cores with Hyper Threading). Threads are between 1.1x (2 CPUs) and 2.5x (8 CPUs) SLOWER than sequential execution. Subinterpreters are between 1.8x (2 CPUs) and 3.6x (8 CPUs) FASTER than sequential execution. Subinterpreters and multiprocessing have basically the same speed on this benchmark. See demo-pyperf.py attached to https://bugs.python.org/issue40512 for the code of the benchmark. Implementation ============== See https://bugs.python.org/issue40512 and related issues for the implementation. I already merged changes, but most code is disabled by default: a new special undocumented --with-experimental-isolated-subinterpreters build mode is required to test it. To reproduce the benchmark, use:: # up to date checkout of Python master branch ./configure \ --with-experimental-isolated-subinterpreters \ --enable-optimizations \ --with-lto make ./python demo-pyperf.py Limits of subinterpreters design ================================ Subinterpreters have a few design limits: * A Python object must not be shared between two interpreters. * Each interpreter has a minimum memory footprint, since Python internal states and modules are duplicated. * Others that I forgot :-) Incomplete implementation ========================= My proof-of-concept is just good enough to compute factorial with the code that I wrote above :-) Any other code is very likely to crash in various funny ways. I added a few "#ifdef EXPERIMENTAL_ISOLATED_SUBINTERPRETERS" for the proof-of-concept. Most are temporary workarounds until some parts of the code are modified to become compatible with subinterpreters, like tuple free lists or Unicode interned strings. Right now, there are still some states which are shared between subinterpreters: like None and True singletons, but also statically allocated types. Avoid shared states should enhance performances. See https://bugs.python.org/issue40512 for the current status and a list of tasks. Most of these tasks are already tracked in Eric Snow's "Multi Core Python" project: https://github.com/ericsnowcurrently/multi-core-python/issues Victor -- Night gathers, and now my watch begins. It shall not end until my death. [View Less]

15 21

Issues with import_fresh_module
by Paul Ganssle May 7, 2020

May 7, 2020

As part of PEP 399 <https://www.python.org/dev/peps/pep-0399/>, an idiom for testing both C and pure Python versions of a library is suggested making use if import_fresh_module. Unfortunately, I'm finding that this is not amazingly robust. We have this issue: https://bugs.python.org/issue40058, where the tester for datetime needs to do some funky manipulations <https://github.com/python/cpython/blob/302e5a8f79514fd84bafbc44b7c97ec63630…>to the state of sys.modules for reasons that … [View More]

5 9

Deprecate os.removedirs() and os.renames()
by Serhiy Storchaka May 7, 2020

May 7, 2020

It seems to me that os.removedirs() and os.renames() was added just for symmetry with os.makedirs(). All three functions have similar structure and was added in the same commit. Seems they were initially code examples of using some os.path and os functions. Unlike to quite popular os.makedirs(), os.removedirs() and os.renames() are not used in the stdlib and rarely used in third party code. os.removedirs() is considered as an opposite to os.makedirs(), and os.renames() is a combination … [View More]

2 1

Latest PEP 554 updates.
by Eric Snow May 7, 2020

May 7, 2020

Hi all, Thanks for the great feedback. I've updated PEP 554 (Multiple Interpreters in the Stdlib) following feedback. https://www.python.org/dev/peps/pep-0554/ Here's a summary of the main changes: * [API] dropped/deferred the "release" and "close" methods from RecvChannel and SendChannel (they were unnecessary and the "association" stuff was too confusing) * [API] dropped RecvChannel/SendChannel.interpreters * [API] dropped/deferred SendChannel.send_buffer() * [API] renamed … [View More]

5 9

Re: Improvement to SimpleNamespace
by Raymond Hettinger May 7, 2020

May 7, 2020

[GvR] > We should not try to import JavaScript's object model into Python. Yes, I get that. Just want to point-out that working with heavily nested dictionaries (typical for JSON) is no fun with square brackets and quotation marks. Raymond

8 10