[Python-3000] Import system questions to be considered for Py3k

Nick Coghlan ncoghlan at gmail.com
Sun Jul 16 06:43:35 CEST 2006


Taking the "import system" to be the overall interaction between the Python 
module namespace and the file system of the underlying computer, I thought I'd 
start compiling a list of the questions we'll want to consider for Py3k. The 
answers to some of them may be "the status quo is fine" but we should still 
ask the questions.

I'll eventually capture the discussion in a Py3k PEP (although I believe many 
of the questions could actually be addressed for 2.6).

The list I've got so far (including some thoughts about possible solutions):

Change to hybrid implementation
-------------------------------
This idea would try to reduce the amount of code in import.c, pushing more of 
the logic into Python code. An advantage of this is that much of the PEP 302 
structure for the standard import mechanisms already exists in pkgutil (since 
PJE consolidated the various emulations that had been added to the standard 
library). Additionally, import logic written in Python would automatically 
benefit from the full Unicode filename support of the builtin open() function.

The various string manipulation operations involved would also be 
significantly easier to handle.

There would be some bootstrapping issues, but I think it would be better to 
try to solve them, rather than continuing to maintain the partial file system 
access API reimplementation that import.c currently uses (that, for example, 
doesn't provide full Unicode filename support on Windows).

Even if most of the logic stays in C code, it would be good to find a way to 
use the full filesystem API, rather than the current import-only subset.

Use smarter data structures
---------------------------
Currently, the individual handlers to load a fully identified module are 
exposed to Python code in a way that reflects the C-style data structures used 
in the current implementation.

Simply switching to more powerful data structures for the file type handlers 
(i.e. use a PyTuple for filedescr values, a PyList for _PyImport_FileTab, and 
a PyDict instead of a switch statement to go from filedescr values to module 
loading/initialisation functions) and manipulating them all as normal Python 
objects could make the code in import.c much easier to follow.

Extensible file type handling
-----------------------------
If the file type handlers are stored in normal Python data structures as 
described above, it becomes feasible to make the import system extensible to 
different file types as well as to different file locations.

This could be handled on a per-package basis, e.g. via a __file_types__ 
special attribute in packages.

Locating support files
----------------------
Currently, locating support files is difficult because __loader__ isn't 
defined for standard modules, and __file__ may not be defined properly if the 
module isn't being executed via load_module(). This needs to be changed so 
that there is an obvious way to locate support files located in the same 
directory as the current module.

Determining the value of __file__
---------------------------------
In PEP 302, the logic to determine the value of __file__ is internal to the 
load_module() method. Should this be exposed so that, e.g., runpy.run_module 
can use it?

Handling sys.argv[0]
--------------------
Should new attributes be added to sys to separate out argv[0] from the command 
line arguments? For example, sys.mainfile (== sys.argv[0]) and sys.args (== 
sys.argv[1:]).

This has compatibility implications for code that _sets_ sys.argv, and expects 
other code to see the changes.


Determining the value of sys.path[0]
------------------------------------
sys.path[0] is set by the interpreter, depending on how the interpreter was 
started.

If the main module is executed by filename, then sys.path[0] is set to the 
directory containing that file. If the main module is inside a package, all of 
the modules in that package have an aliasing problem (reachable as both 
top-level modules and by their full name).

All other means of invocation leave sys.path[0] set to '', to indicate 
"current working directory". Should this actually read the name of the current 
working directory from the OS when the interpreter starts, or should it 
continue to reflect changes to the working directory over the course of execution?

Should there be a command line switch to set sys.path[0] directly (or avoid 
having it set at all)? This would make it possible to avoid the aliasing 
problem with running modules from inside package directories, as well as 
allowing -m execution to be used for a module or package that is not in the 
current directory, but isn't on PYTHONPATH or in site-packages, either. (The 
latter would be a convenience for testing purposes, rather than something an 
installed Python application should be reliant on)

Handling relative imports
-------------------------
Currently the import system has to look at __name__, and then check if 
__path__ is present, in order to decide how to handle a relative import - the 
handling is different depending on whether the current module is a package or not.

Defining a new special variable __pkg_name__ would allow the import system to 
use consistent logic for both packages and normal modules. This would also 
mean that relative imports would work correctly even when __name__ is set to 
something like "__main__".

Revisiting PEP 299
------------------
The general consensus recently has been that the "if __name__ == '__main__':" 
idiom for modules that can be both support modules and main modules is both 
ugly and unintuitive.

PEP 299 (__main__ functions) was rejected for the 2.x series due to backward 
compatibility concerns (in particular, with modules that include the line 
"import __main__"). Py3k provides an opportunity to revisit that decision.

If it is taken as a given that the idiom needs to change, then the question is 
whether the major change proposed by PEP 299 is a better option than a simpler 
change such as a new builtin boolean variable that can be tested via something 
like "if is_main:".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list