[Python-Dev] Empty directory is a namespace?

Sun Jun 24 19:44:52 CEST 2012

On Sun, Jun 24, 2012 at 3:51 AM, "Martin v. Löwis" <martin at v.loewis.de>wrote:

> On 23.06.2012 17:58, Antoine Pitrou wrote:
> > On Sat, 23 Jun 2012 17:55:24 +0200
> > martin at v.loewis.de wrote:
> >>> That's true. I would have hoped for it to be recognized only when
> >>> there's at least one module or package inside, but it doesn't sound
> >>> easy to check for (especially in the recursive namespace packages case
> >>> - is that possible?).
> >>
> >> Yes - a directory becomes a namespace package by not having an
> __init__.py,
> >> so the "namespace package" case will likely become the default, and
> people
> >> will start removing the empty __init__.pys when they don't need to
> support
> >> 3.2- anymore.
> >
> > Have you tested the performance of namespace packages compared to
> > normal packages?
>
> No, I haven't.
>

It's probably not worthwhile; any performance cost increase due to looking
at more sys.path entries should be offset by the speedup of any subsequent
imports from later sys.path entries.

Or, to put it another way, almost all the extra I/O cost of namespace
packages is paid only once, for the *first* namespace package imported.  In
effect, this means that the amortized cost of using namespace packages
actually *decreases* as namespace packages become more popular.  Also, the
total extra overhead equals the cost of a listdir() for each directory on
sys.path that would otherwise not have been checked for an import.  (So,
for example, if even one import fails over the life of a program's
execution, or it performs even one import from the last directory on
sys.path, then there is no actual extra overhead.)

Of course, there are still cache validation stat() calls, and they make the
cost of an initial import of a namespace package (vs. a self-contained
package with __init__.py) to be an extra N stat() calls, where N is the
number of sys.path entries that appear *after* the sys.path directory where
the package is found.  (This cost of course must still be compared against
the costs of finding, opening, and running an empty __init__.py[co] file,
so it may actually still be quite competitive in many cases.)

For imports *within* a namespace package, similar considerations apply,
except that N is smaller, and in the simple case of replacing a
self-contained package with a namespace (but not adding any additional path
locations), N will be zero, making imports from inside the namespace run
exactly as quickly as normal imports.

In short, it's not worth worrying about, and definitely nothing that should
cause people to spread an idea that __init__.py somehow speeds things up.
If there's a difference, it'll likely be lost in measurement noise, due to
importlib's new directory caching mechanism.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120624/902375bd/attachment.html>