[Distutils] For maximum performance, Python packages are best installed as zip files.

Antoine Pitrou solipsis at pitrou.net
Mon Apr 11 07:36:20 EDT 2016


On Mon, 11 Apr 2016 07:33:05 -0400
Donald Stufft <donald at stufft.io> wrote:
> 
> > On Apr 11, 2016, at 7:23 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > 
> > On Mon, 11 Apr 2016 07:08:19 -0400
> > Donald Stufft <donald at stufft.io> wrote:
> >> 
> >> I’m not sure if that is still the case with modern SSDs, but I think the idea is that by putting everything inside of zip files you reduce the number of stat calls that Python needs to do (they flip side of this is that pkg_resources is incredibly slow because it needs to issue a ton of stat calls on import).
> > 
> > I don't think SSDs have anything to do in it since the kernel should
> > cache directory contents, rather it's the number of system calls issued.
> > But as Paul says, in Python 3 importlib got a lot of optimization work
> > on this front, so this advice probably doesn't apply anymore.
> > 
> 
> Surely SSDs are faster at returning the data (metadata or actual data) than spinning rust and thus would speed up importing on their own. Sure you might have that data cached in memory if you’ve recently accessed it, but you just as easily might not have that data cached and the OS might need to hit the disk to find out.

That doesn't really follow. There is a certain number of disk accesses
that is necessary to get the required metadata in a correct way. That
number of accesses hasn't changed between Python 2 and Python 3. What
has changed is the number of spurious stat() calls asking for
information that can be cached. In Python 2, the OS would have done the
caching (sparing disk accesses but not system calls). On Python 3, the
caching is done inside importlib, which spares system calls in addition
to disk accesses.

(yes, this is a bit simplified because of NFS and friends ;-))

Regards

Antoine.


More information about the Distutils-SIG mailing list