[Python-Dev] Writing importers and path hooks

Thu Mar 28 18:39:03 CET 2013

On Thu, Mar 28, 2013 at 12:33 PM, Paul Moore <p.f.moore at gmail.com> wrote:

> On 28 March 2013 16:08, Brett Cannon <brett at python.org> wrote:
> > You only need SourceLoader since you are dealing with Python source. You
> > don't need FileLoader since you are not reading from disk but an
> in-memory
> > zipfile.
> >
> > You should be implementing get_data, get_filename, and path_stats for
> > SourceLoader.
>
> OK, cool. That helps a lot.
>
> The biggest gap here is that I don't think that anywhere has a good
> explanation of the required semantics of get_filename - particularly
> where we're not actually dealing with real filenames.

It's because there aren't any. =) This is the first time alternative
storage mechanisms are really easily viable without massive amounts of
work, so no one has figured this out. The real question is how code out in
the wild would react if you did something like /path/to/sqlite3:pkg.mod
which is very much not a file path.

> My initial stab
> at this would be:
>
> A module name is a dot-separated list of parts.
> A filename is an arbitrary token that can be used with get_data to get
> the module content. However, the following rules should be followed:
> - Filenames should be made up of parts separated by the OS path separator.
>

And why is that? A database doesn't need those separators as the module
name would just be the primary key.

> - For packages, the final section of the filename *must* be
> __init__.py if the standard package detection is being used.
>

Once again, why? A column in a database that is nothing more than a package
flag would solve this as well, negating the need for this. The whole point
of is_package() on loaders is to get away from this reliance on __file__
having any meaning beyond "this is the string that represents where this
module's code was loaded from".

> - The initial part of the filename needs to match your path entry if
> submodule lookups are going to work sanely
>

When applicable that's fine.

>
> In practice, you need to implement filenames as if your finder is
> managing a virtual filesystem mounted at your sys.path entry, with
> module->filename semantics being the usual subdirectory layout. And
> packages have a basename of __init__.py.
>

That's one way of doing it, but it does very much tie imports to files and
it doesn't generalize the concept to places where file paths simply do not
need to apply.

>
> I'd like to know how to implement packages without the artificial
> __init__.py (something like a sqlite database can attach content and
> an "is_package" flag to the same entry). But that's advanced usage,
> and I can probably hack around until I work out how to do that now.
>

Define is_package(). I personally want to change the API somehow so you ask
for what __path__ should be set to. Unfortunately without going down the
"False means not a package, everything else means it is and what is
returned should be set on __path__" is a bit hairy and not
backwards-compatible unless you require a list that always evaluates to
True for packages.

>
> >> The documentation on what I
> >> need to return from there is very sparse... In the end I worked out
> >> that for a package, I need to return (MyLoader(modulename,
> >> 'foo/__init__.py'), ['foo']) (here, "foo" is my dummy marker for my
> >> example).
> >
> > The second argument should just be None: "An empty list can be used for
> > portion to signify the loader is not part of a [namespace] package".
> > Unfortunately a key word is missing in that sentence.
> > http://bugs.python.org/issue17567
>
> Ha. Yes, that makes a lot of difference :-) Did you mean None or [], by
> the way?
>

Empty list. You can check the code to see if it would work with None, but a
list is expected to be used so an empty list is more consistent and still
false.

>
> >> In essence, PathEntryFinder really has to implement some
> >> form of virtual filesystem mount point, and preserve the standard
> >> filesystem semantics of modules having a filename of .../__init__.py.
> >
> > Well, if your zip file decided to create itself with a different file
> > extension then it wouldn't be required, but then other people's code
> might
> > break if they don't respect module abstractions (i.e. looking at
> > __package__/__name__ or __path__ to see if something is a package).
>
> I'm not quite sure what you mean by this, but I take your point about
> making sure to break people's expectations as little as possible...
>

To tell if a module is a package, you should do either ``if mod.__name__ ==
mod.__package__`` or ``if hasattr(mod, '__path__')``.

>
> >> So I managed to work out what was needed in the end, but it was a lot
> >> harder than I'd expected. On reflection, getting the finder semantics
> >> right (and in particular the path entry finder semantics) was the hard
> >> bit.
> >
> > Yep, that bit has had the least API tweaks as most people don't muck with
> > finders but with loaders.
>
> Hmm. I'm not sure how you can ever write a loader without needing to
> write an associated finder. The existing finders wouldn't return your
> loader, surely?
>

If you are not changing the storage mechanism you don't need a new finder;
what importlib provides works fine. So if you are, for instance, only
providing a loader which does an AST optimization pass you only need a new
loader. Or if you use a DSL that you compile into Python code then you only
need a new loader.

>
> >> I'm now 100% sure that some cookbook examples would help a lot. I'll
> >> see what I can do.
> >
> > I plan on writing a pure Python zip importer for Python 3.4 which should
> be
> > fairly minimal and work out as a good example chunk of code.  And no one
> > need bother writing it as I'm going to do it myself regardless to make
> sure
> > I plug any missing holes in the API. If you really want something to try
> for
> > fun go for a sqlite3-backed setup (don't see it going in the stdlib but
> it
> > would be a project to have).
>
> I'm pretty sure I'll write a zip importer first - it feels like one of
> those essential but largely useless exercises that people have to
> start with - a bit like scales on the piano :-) But I'd be interested
> in trying a sqlite importer as well. I might well see how I go with
> that.
>

The sqlite3 one is interesting as it does not whatsoever require file paths
to operate; you can easily define a schema specific to source code and
bytecode and really go db-specific and have the loader work from that
(would also make finder lookups dead-simple). Otherwise you will end up
writing a schema for a virtual filesystem which would also work but would
show that people are not respecting abstractions on modules (or that the
API has gaps which need filling in).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130328/6f7e2607/attachment.html>