<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Mar 28, 2013 at 12:33 PM, Paul Moore <span dir="ltr"><<a href="mailto:p.f.moore@gmail.com" target="_blank">p.f.moore@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 28 March 2013 16:08, Brett Cannon <<a href="mailto:brett@python.org">brett@python.org</a>> wrote:<br>
> You only need SourceLoader since you are dealing with Python source. You<br>
> don't need FileLoader since you are not reading from disk but an in-memory<br>
> zipfile.<br>
><br>
</div><div class="im">> You should be implementing get_data, get_filename, and path_stats for<br>
> SourceLoader.<br>
<br>
</div>OK, cool. That helps a lot.<br>
<br>
The biggest gap here is that I don't think that anywhere has a good<br>
explanation of the required semantics of get_filename - particularly<br>
where we're not actually dealing with real filenames.</blockquote><div><br></div><div style>It's because there aren't any. =) This is the first time alternative storage mechanisms are really easily viable without massive amounts of work, so no one has figured this out. The real question is how code out in the wild would react if you did something like /path/to/sqlite3:pkg.mod which is very much not a file path.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> My initial stab<br>
at this would be:<br>
<br>
A module name is a dot-separated list of parts.<br>
A filename is an arbitrary token that can be used with get_data to get<br>
the module content. However, the following rules should be followed:<br>
- Filenames should be made up of parts separated by the OS path separator.<br></blockquote><div><br></div><div style>And why is that? A database doesn't need those separators as the module name would just be the primary key.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
- For packages, the final section of the filename *must* be<br>
__init__.py if the standard package detection is being used.<br></blockquote><div><br></div><div style>Once again, why? A column in a database that is nothing more than a package flag would solve this as well, negating the need for this. The whole point of is_package() on loaders is to get away from this reliance on __file__ having any meaning beyond "this is the string that represents where this module's code was loaded from".</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
- The initial part of the filename needs to match your path entry if<br>
submodule lookups are going to work sanely<br></blockquote><div><br></div><div style>When applicable that's fine.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
In practice, you need to implement filenames as if your finder is<br>
managing a virtual filesystem mounted at your sys.path entry, with<br>
module->filename semantics being the usual subdirectory layout. And<br>
packages have a basename of __init__.py.<br></blockquote><div><br></div><div style>That's one way of doing it, but it does very much tie imports to files and it doesn't generalize the concept to places where file paths simply do not need to apply.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I'd like to know how to implement packages without the artificial<br>
__init__.py (something like a sqlite database can attach content and<br>
an "is_package" flag to the same entry). But that's advanced usage,<br>
and I can probably hack around until I work out how to do that now.<br></blockquote><div><br></div><div style>Define is_package(). I personally want to change the API somehow so you ask for what __path__ should be set to. Unfortunately without going down the "False means not a package, everything else means it is and what is returned should be set on __path__" is a bit hairy and not backwards-compatible unless you require a list that always evaluates to True for packages.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
>> The documentation on what I<br>
>> need to return from there is very sparse... In the end I worked out<br>
>> that for a package, I need to return (MyLoader(modulename,<br>
>> 'foo/__init__.py'), ['foo']) (here, "foo" is my dummy marker for my<br>
>> example).<br>
><br>
> The second argument should just be None: "An empty list can be used for<br>
> portion to signify the loader is not part of a [namespace] package".<br>
> Unfortunately a key word is missing in that sentence.<br>
> <a href="http://bugs.python.org/issue17567" target="_blank">http://bugs.python.org/issue17567</a><br>
<br>
</div>Ha. Yes, that makes a lot of difference :-) Did you mean None or [], by the way?<br></blockquote><div><br></div><div style>Empty list. You can check the code to see if it would work with None, but a list is expected to be used so an empty list is more consistent and still false.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
>> In essence, PathEntryFinder really has to implement some<br>
>> form of virtual filesystem mount point, and preserve the standard<br>
>> filesystem semantics of modules having a filename of .../__init__.py.<br>
><br>
> Well, if your zip file decided to create itself with a different file<br>
> extension then it wouldn't be required, but then other people's code might<br>
> break if they don't respect module abstractions (i.e. looking at<br>
> __package__/__name__ or __path__ to see if something is a package).<br>
<br>
</div>I'm not quite sure what you mean by this, but I take your point about<br>
making sure to break people's expectations as little as possible...<br></blockquote><div><br></div><div style>To tell if a module is a package, you should do either ``if mod.__name__ == mod.__package__`` or ``if hasattr(mod, '__path__')``.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
>> So I managed to work out what was needed in the end, but it was a lot<br>
>> harder than I'd expected. On reflection, getting the finder semantics<br>
>> right (and in particular the path entry finder semantics) was the hard<br>
>> bit.<br>
><br>
> Yep, that bit has had the least API tweaks as most people don't muck with<br>
> finders but with loaders.<br>
<br>
</div>Hmm. I'm not sure how you can ever write a loader without needing to<br>
write an associated finder. The existing finders wouldn't return your<br>
loader, surely?<br></blockquote><div><br></div><div style>If you are not changing the storage mechanism you don't need a new finder; what importlib provides works fine. So if you are, for instance, only providing a loader which does an AST optimization pass you only need a new loader. Or if you use a DSL that you compile into Python code then you only need a new loader.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
>> I'm now 100% sure that some cookbook examples would help a lot. I'll<br>
>> see what I can do.<br>
><br>
> I plan on writing a pure Python zip importer for Python 3.4 which should be<br>
> fairly minimal and work out as a good example chunk of code. And no one<br>
> need bother writing it as I'm going to do it myself regardless to make sure<br>
> I plug any missing holes in the API. If you really want something to try for<br>
> fun go for a sqlite3-backed setup (don't see it going in the stdlib but it<br>
> would be a project to have).<br>
<br>
</div>I'm pretty sure I'll write a zip importer first - it feels like one of<br>
those essential but largely useless exercises that people have to<br>
start with - a bit like scales on the piano :-) But I'd be interested<br>
in trying a sqlite importer as well. I might well see how I go with<br>
that.<br></blockquote><div><br></div><div style>The sqlite3 one is interesting as it does not whatsoever require file paths to operate; you can easily define a schema specific to source code and bytecode and really go db-specific and have the loader work from that (would also make finder lookups dead-simple). Otherwise you will end up writing a schema for a virtual filesystem which would also work but would show that people are not respecting abstractions on modules (or that the API has gaps which need filling in).</div>
</div><br></div></div>