[New-bugs-announce] [issue44497] distutil findall can choke with recursive symlinks (performance)
Sorin Sbarnea
report at bugs.python.org
Wed Jun 23 05:20:31 EDT 2021
New submission from Sorin Sbarnea <sorin.sbarnea at gmail.com>:
As the results of investigating a very poor performance of pip while trying to install some code I was able to identify that the root cause was the current implementation of distutils.filelist.findall or to be more precise the _find_all_simple function, which does followsymlinks but without any measures for preventing recursivity and duplicates.
To give an idea in my case it was taking 5-10minutes to run while the CPU was at 100%, for a repository with 95k files (most of them temp inside .tox folders). Removal of the symlinks did make it run in ~5s.
IMHO, _find_all_simple should normalize paths and avoid returning any duplicates.
Realted: https://bugs.launchpad.net/pbr/+bug/1933311
----------
components: Distutils
messages: 396394
nosy: dstufft, eric.araujo, ssbarnea
priority: normal
severity: normal
status: open
title: distutil findall can choke with recursive symlinks (performance)
versions: Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 3.9
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44497>
_______________________________________
More information about the New-bugs-announce
mailing list