[Python-checkins] r52364 - sandbox/trunk/import_in_py/importer.py sandbox/trunk/import_in_py/test_importer.py

brett.cannon python-checkins at python.org
Tue Oct 17 21:20:51 CEST 2006


Author: brett.cannon
Date: Tue Oct 17 21:20:50 2006
New Revision: 52364

Modified:
   sandbox/trunk/import_in_py/importer.py
   sandbox/trunk/import_in_py/test_importer.py
Log:
Rework the py/pyc handler.  Factor out a superclass that implements the
algorithm for verifying bytecode, using either bytecode or source, and possibly
regenerating bytecode.  The algorithm itself makes no assumptions and uses
supporting methods for specific details of how to get the source or bytecode.

Also added a py/pyc filesystem handler that subclasses the generic superclass.
This class assumes a simple file-like interface on the source and bytecode.
Files work right off, but StringIO instances work if one adds on a 'name'
interface.


Modified: sandbox/trunk/import_in_py/importer.py
==============================================================================
--- sandbox/trunk/import_in_py/importer.py	(original)
+++ sandbox/trunk/import_in_py/importer.py	Tue Oct 17 21:20:50 2006
@@ -1,10 +1,5 @@
 """Re-implementation of import machinery in Python source code.
 
-When developing please keep imports down to a minimum.  There is a basic
-bootstrapping problem of importing in import!  It is best to keep imports to
-only built-in modules (see sys.builtin_module_names) as having the built-in
-importer written in C is fairly straight-forward.
-
 References on import:
 * Language reference
     http://docs.python.org/ref/import.html
@@ -24,64 +19,51 @@
 Clarifications for PEP 302:
     * Raise ImportError when load_module() fails to load a module without
       raising an exception.
-      
-Differences from C implementation:
-    * Bytecode handler handles regenerating the source code rather than when
-      the source code is handled.  This puts the burden of regeneration of
-      bytecode where it belongs since the source never requires the bytecode to
-      exist to work properly.
+    * Is module returned by load_module() actually used for anything?
+
+Things to be exposed at the Python level:
+    * Python/marshal.c:r_long()/w_long()
 
 Possible Py3K improvements:
-    * Have __import__ check for sys.modules entry to alleviate need for every
-      loader to do so.
-    * Have __import__ pass into the loader the module to be initialized so as
-      remove that boilerplate (also keeps sys.modules manipulation within
-      __import__ when the previous suggestion is used).
-    * Put importer objects directly into sys.path to remove need for
-      sys.path_importer_cache.  Could leave string entries on sys.path that do
-      not have an importer so they can be re-checked the next time a new import
-      is attempted.
-        + If __import__ handles sys.modules then the impact from having to
-          recheck sys.path entries that lack an importer is minimized as it is
-          only on imports that have not been handled before.
-            - This has the drawback of making it more difficult for
-              non-standard modules to be put into sys.modules.  Could go other
-              way and hyper-generalize by having a meta_path importer that
-              returns entries in sys.path.
+* Have a meta_path entry for checking sys.modules to remove need for
+  loaders to do it.
+* Put importer objects directly into sys.path to remove need for
+  sys.path_importer_cache.  Could leave string entries on sys.path that do
+  not have an importer so they can be re-checked the next time a new import
+  is attempted.
+    + If __import__ handles sys.modules then the impact from having to
+      recheck sys.path entries that lack an importer is minimized as it is
+      only on imports that have not been handled before.
+        - This has the drawback of making it more difficult for
+          non-standard modules to be put into sys.modules.  Could go other
+          way and hyper-generalize by having a meta_path importer that
+          returns entries in sys.path.
+* Loaders don't have to return the loaded module.
+    + Since they have the responsibility of adding it to sys.modules, there
+      no real need.
+    + Importing the module being imported in a circular import dependency
+      requires that module added to sys.modules stay consistent from the
+      point it is added to initialization.
+       
+Rejected Py3K improvements:
+* Passing in new module to loaders
+    Creating a new module is minimal and loader might want to use a different
+    type of object.
       
 PTL use-case:
-    * Tweaked source files that need to be pre-processed before they are imported.
-    * Should be able to write out bytecode files easily.
-    * Should let them use as much base infrastructure from the source and bytecode
-      handlers as possible along with the filesystem importer/loader.
-    * Expected implementaiton
-        + Source handler
-            - Set 'handles' to source extension
-                * 'ptl'
-            - Override get_code()
-                * Read PTL source.
-                * Translate as needed.
-                * Generate code object.
-        + Bytecode handler
-            - Set 'handles' to bytecode extension
-                * 'ptlc'
+* XXX
                 
 zipimport use-case:
-    * New path_hooks function.
-        + If sys.path entry is an existing zip file, return an importer for it.
-        + Initialize the path_hooks function with handlers that are to be considered
-          for use.
-    * New importer.
-        + Open zipfile and check if it has a path to the desired module.
-    * New loader.
-        + Return a new file-like object for the zipfile for the contained file.
-            - Implement read().
-                * Should it implement # of bytes to return, or just a complete read?
-            - Have a 'name' attribute.
-    * PySourceHandler() and PyBytecodeHandler() would need to switch to accepting a
-      file-like object that has a read() method along with a name attribute (for use
-      in compile() for error reporting).  Also do not pass in a source provider for
-      the bytecode handler so as to suppress that writing of new bytecode.
+* Simple way
+    + XXX
+* Feature-rich way
+    + XXX
+      
+sqlite3 importer use-case:
+* Simple way
+    + XXX
+* Feature-rich way
+    + XXX
 
 """
 from __future__ import with_statement
@@ -89,12 +71,14 @@
 import imp
 import sys
 import marshal
-# XXX Importing os will not work in the end as it is implemented in Python
-# itself.
+# XXX Following imports will eventually need to be removed since they involve
+# Python source.
 import os
+import contextlib
+import py_compile
 
 
-class BuiltinFrozen_Importer(object):
+class BuiltinFrozenBaseImporter(object):
 
     """Base class for meta_path importers for built-in and frozen modules.
 
@@ -139,7 +123,7 @@
             return mod
 
 
-class BuiltinImporter(BuiltinFrozen_Importer):
+class BuiltinImporter(BuiltinFrozenBaseImporter):
 
     """sys.meta_path class for importing built-in modules."""
 
@@ -147,7 +131,7 @@
     _load = imp.init_builtin
 
 
-class FrozenImporter(BuiltinFrozen_Importer):
+class FrozenImporter(BuiltinFrozenBaseImporter):
 
     """sys.meta_path class for importing frozen modules."""
 
@@ -206,7 +190,7 @@
                     if os.path.isfile(file_path):
                         raise StopIteration("file found")
         except StopIteration:
-            return FileSystemLoader(file_path, handler)
+            return FileSystemLoader(file_path, handler, file_ext)
         else:
             return None
 
@@ -215,14 +199,16 @@
 
     """Loader for the filesystem."""
 
-    def __init__(self, file_path, handler):
+    def __init__(self, file_path, handler, chosen_handle):
+        """Store arguments on to the instance."""
         self.file_path = file_path
         self.handler = handler
+        self.chosen_handle = chosen_handle
 
     def load_module(self, fullname, path=None):
         """Load the module from self.path using self.handler.
         
-        The handler is expected to implement a handle_file method that will
+        The handler is expected to implement a handle_code method that will
         deal with initializing the module passed to it.
         
         """
@@ -232,126 +218,280 @@
         except KeyError:
             module = imp.new_module(fullname)
             sys.modules[fullname] = module
-            # XXX Could have handlers return code objects and do module initialization
-            # ourselves if extension modules and imp were changed to support such a thing.
-            self.handler.handle_file(module, fullname, path, self.file_path)
             module.__loader__ = self
+            module.__file__ = self.file_path
+            module.__name__ = fullname
+            try:
+                with open(self.file_path) as code_file:
+                    self.handler.handle_code(module, code_file,
+                                                self.chosen_handle)
+            except:
+                del sys.modules[fullname]
+                raise
             return module
 
 
-class PySourceHandler(object):
-
-    """Handler for importing Python source modules."""
+class PyPycBaseHandler(object):
     
-    # XXX Need to have a way to generate bytecode when none was found in the
-    # first place.  If there ends up being too much overlap with the bytecode
-    # handler in terms of bytecode generation then consider
-    # following the current convention and have a unified source/bytecode
-    # handler that does it all.
-
-    handles = ('.py',)
+    """Superclass for reading bytecode and/or source code for initializing a
+    module (with optional recreating of bytecode as needed).
+    
+    The entry point of the class is handle_code().  All other methods are
+    provided for overriding to control how bytecode/source handling algorithm
+    gets what it needs.  handle_code() treats source code and bytecode as
+    opaque objects that are passed around to the proper methods for handling.
+    This allows storage back-ends to use the algorithm in handle_code() while
+    not having to worry about any assumptions about how source or bytecode are
+    dealt with.  It also allows for possible source or bytecode translations
+    before a code object is expected for initializing the module.
     
-    def get_code(self, file_path):
-        """Return the code object as stored at file_path.
+    To suppress the implicit use of source code when handling bytecode (or to
+    suppress writing new bytecode when handling source code), set the proper
+    argument passed to the object initializer to a False value.  This allows
+    for subclasses to skip having to implement support for handling source code
+    (or bytecode) if it is not needed/desired.
+    
+    """
+    
+    def __init__(self, source_handles, bytecode_handles):
+        """Specify the handles for source and bytecode to be handled by this
+        handler and set 'handles' appropriately.
         
-        Provided for use by PyBytecodeHandler to recreate bytecode for a source
-        file.
+        If either source code or bytecode are not to be used, pass in a false
+        value for the appropriate argument.
         
         """
-        with open(file_path, 'rU') as source_file:
-            source_code = source_file.read()
-        return compile(source_code, file_path, 'exec')      
+        if not source_handles:
+            self.source_handles = tuple()
+        else:
+            self.source_handles = source_handles
+        if not bytecode_handles:
+            self.bytecode_handles = tuple()
+        else:
+            self.bytecode_handles = bytecode_handles
+        self.handles = self.bytecode_handles + self.source_handles
+        
+    def find_source_to_read(self, opaque_bytecode):
+        """Return the opaque source object corresponding to the opaque
+        bytecode object for reading, or None if no corresponding source exists.
+        
+        Meant to be overridden as needed (always returns None otherwise).
+        
+        """
+        return None
+        
+    def find_bytecode_to_write(self, opaque_source):
+        """Return the opaque bytecode object that corresponds to the opaque
+        source object for writing, or None if no corresponding bytecode exists.
+        
+        Meant to be overridden as needed (always returns None otherwise).
+        
+        """
+        return None
+        
+    def get_location(self, opaque_code):
+        """Return the "location" of the opaque object."""
+        raise NotImplementedError
+        
+    def get_bytecode(self, opaque_bytecode):
+        """Return the magic number, timestamp, and bytecode from the opaque
+        bytecode object."""
+        raise NotImplementedError
+        
+    def get_code_from_bytecode(self, bytecode):
+        """Return the code object created from the bytecode."""
+        return marshal.loads(bytecode)
+        
+    def verify_magic(self, magic_number):
+        """Compare the given magic_number against the one the interpreter
+        uses."""
+        return True
+        # XXX Won't work until can unmarshal longs.
+        return True if magic_number == imp.get_magic() else False
+        
+    def verify_timestamp(self, timestamp, opaque_source):
+        """Verify the timestamp against the opaque source object."""
+        raise NotImplementedError
+        
+    def get_code_from_source(self, opaque_source):
+        """Return a code object created from the opaque source object along
+        with the timestamp of the opaque object.
+        
+        The timestamp needs to be returned for possible recreation of the
+        bytecode.  This is to prevent the possible race condition where the
+        bytecode's timestamp is used; if the source is modified between reading
+        it and the writing out of the bytecode then the bytecode would not be
+        recreated.
+        
+        """
+        raise NotImplementedError
+        
+    def write_bytecode(self, code_object, opaque_bytecode, timestamp):
+        """Dump the code object on to the opaque bytecode object using the
+        specified timestamp."""
+        raise NotImplementedError
+        
+    def handle_code(self, module, opaque_code, handle_this):
+        """Initialize the module using the opaque code object (treating it as
+        specified by the passed-in handle).
+        
+        If an opaque bytecode object was passed in, first verify its magic
+        number and timestamp.  If it checks out, it will be used to initialize
+        the module.  If the bytecode is invalid, then try to fetch the
+        corresponding source code.  If it can be found, use it and recreate
+        the bytecode if possible (if bytecode cannot be recreated, assign the
+        proper value to __file__).  If no source can be found, raise
+        ImportError.
+        
+        If an opaque source object was given, use it to initialize the module.
+        Also recreate the bytecode (if desired based on whether any bytecode
+        handles are listed) since either it does not exist yet (based
+        on the fact that bytecode are listed to be handled first) or it
+        was found to be invalid.
 
-    def handle_file(self, module, fullname, path, file_path):
-        """Import the Python source file at file_path and use it to
-        initialize 'module'."""
-        module.__file__ = file_path
-        module.__name__ = fullname
-        compiled_code =  self.get_code(file_path)
-        exec compiled_code in module.__dict__
+        """
+        if handle_this in self.bytecode_handles:
+            # Asked to handle bytecode.
+            try:
+                magic, timestamp, bytecode = self.get_bytecode(opaque_code)
+                # Need source code no matter what (if source code is supported
+                # by handler); for timestamp check or to see if source can be
+                # used if magic number check failed.
+                if self.source_handles:
+                    opaque_source = self.find_source_to_read(opaque_code)
+                else:
+                    opaque_source = None
+                if not self.verify_magic(magic):
+                    raise ImportError("bad magic number")
+                if (opaque_source and
+                      not self.verify_timestamp(timestamp, opaque_source)):
+                        raise ImportError("outdated timestamp")
+            except ImportError, exc:
+                if not self.source_handles or not opaque_source:
+                    # Bytecode invalid and no source to recreate from, so error
+                    # out.
+                    raise
+                else:
+                    # Bytecode invalid, but source exists to work from.
+                    pass
+            else:
+                # Bytecode is valid.
+                code_object = self.get_code_from_bytecode(bytecode)
+                exec code_object in module.__dict__
+                return module
+        # Either the handler was requested to handle source or the bytecode was
+        # found to be stale.  Regardless, source code was already fetched.
+        if handle_this in self.source_handles:
+            # Requested to handle source.
+            opaque_source = opaque_code
+            code_object, timestamp = self.get_code_from_source(opaque_source)
+        elif self.bytecode_handles:
+            # Bytecode was invalid.
+            code_object, timestamp = self.get_code_from_source(opaque_source)
+        else:
+            # Should never be reached; importer should have stated that module
+            # could not be imported.
+            raise ImportError("no source or bytecode handles specified")
+        exec code_object in module.__dict__
+        if self.bytecode_handles:
+            opaque_bytecode = self.find_bytecode_to_write(opaque_source)
+            if opaque_bytecode:
+                # Recreate bytecode.
+                self.write_bytecode(code_object, opaque_bytecode, timestamp)
+            else:
+                # Need to reassign __name__ since bytecode that was supposed
+                # to use was bad and we could not update to make it as if we
+                # has used it.
+                module.__file__ = self.get_location(opaque_source)
         return module
 
 
-class PyBytecodeHandler(object):
-
-    """Handler for importing .pyc/.pyo modules.
+class PyPycFileHandler(PyPycBaseHandler):
+    
+    """Handler for source code and bytecode files.
     
-    Subclasses are expected at least override 'handles'.  If any special
-    bytecode handling is needed in terms of recreating it, then use the
-    appropriate source provider during initialization.
+    All methods that work with opaque objects expect a file-like interface:
+    * read(n=-1)
+        Read n bytes from the file, or all bytes if no argument is given.
+    * close()
+        Close the file.  It is a no-op if called previously.
+    * name
+        Attribute with the location to the file.  If source and bytecode are
+        not both used by an instance of this class then the values does not
+        need to be a valid path, otherwise care needs to be taken to make sure
+        the value is reasonable.
+        
+    The file type by default implements the required interface.  StringIO
+    objects require the 'name' attribute to be set.
     
     """
     
-    def __init__(self, source_provider=None):
-        """Store a source handler in case bytecode is invalid.
+    def __init__(self, source_handles=None, bytecode_handles=None):
+        """Set the file extensions to be handled.
         
-        source_provider must implement get_code(file_path) and return a code
-        object to be used for writing a new bytecode file.  It also needs to
-        have a 'handles' attribute.
+        Not passing an argument (or setting to None) for either type of file
+        extension will lead to default values being used.
         
         """
-        self.source_provider = source_provider
-        self.handles = ('.pyc',) if __debug__ else ('.pyo',)
+        if source_handles is None:
+            source_handles = ('.py',)
+        if bytecode_handles is None:
+            bytecode_handles = ('.pyc',) if __debug__ else ('.pyo',)
+        super(PyPycFileHandler, self).__init__(source_handles,
+                                                        bytecode_handles)
     
-    def find_source(self, bytecode_path):
-        """Return the path to the source file for the bytecode or None if it
-        was not found."""
-        if self.source_provider is None:
-            return None
-        source_ext = self.source_provider.handles
-        bytecode_base, bytecode_ext = os.path.splitext(bytecode_path)
-        source_path = bytecode_base + '.' + source_ext
-        return source_path if os.path.exists(source_path) else None
-    
-    def validate_magic(self, marshalled_magic):
-        """Return a boolean as to whether the marshalled magic number is good
-        or not."""
-        return True
-        # XXX Need Python/marshal.c:r_long() exposed.
-        magic_number = marshal.loads(marshalled_magic)
-        return True if magic_number == imp.get_magic() else False
-
-    def validate_timestamp(self, marshalled_timestamp, source_path):
-        """Return a boolean as to whether the timestamp was valid or not
-        compared to the source file."""
-        return True
-        # XXX Need Python/marshal.c:r_long() exposed. 
-        bytecode_timestamp = marshal.loads(marshalled_timestamp)
-        source_timestampe = os.stat(source_path).st_mtime
-        if source_timestamp >> 32:
-            raise OverflowError("modification time overflows a 4 byte field")
-        return True if source_timestamp <= bytecode_timestamp else False
-        
-    def regenerate_bytecode(self, source_path, bytecode_path):
-        """Regenerate the bytecode_path file from source_path and return the
-        code object created."""
-        raise NotImplementedError("need to be able to marshal longs directly")
-        # XXX Need to be expose Python/marshal.c:w_long()
-        timestamp = os.stat(source_path).st_mtime
-        code_object = self.source_provider.get_code(source_path)
-        with open(bytecode_path, 'wb') as bytecode_file:
-            marshal.dump(imp.get_magic(), bytecode_file)
-            marshal.dump(timestamp, bytecode_file)
-            marshal.dump(code_object, bytecode_file)
-        return code_object
-
-    def handle_file(self, module, fullname, path, file_path):
-        """Import the Python bytecode file at 'path' and use it to initialize
-        'module'."""
-        with open(file_path, 'rb') as bytecode_file:
+    def find_source_to_read(self, bytecode_file):
+        """Return the file object to the corresponding source code."""
+        base, ext = os.path.splitext(self.get_location(bytecode_file))
+        return open(base + self.source_handles[-1], 'U')
+
+    def find_bytecode_to_write(self, source_file):
+        """Return the file object to the corresponding bytecode."""
+        base, ext = os.path.splitext(self.get_location(source_file))
+        return open(base + self.bytecode_handles[-1], 'wb')
+        
+    def get_location(self, file_object):
+        """Return the path to the file object."""
+        return file_object.name
+        
+    def get_bytecode(self, bytecode_file):
+        """Return the magic number, timestamp, and bytecode from the bytecode
+        file."""
+        with contextlib.closing(bytecode_file):
             magic = bytecode_file.read(4)
             timestamp = bytecode_file.read(4)
-            compiled_code = marshal.load(bytecode_file)
-        source_path = self.find_source(file_path)
-        if source_path:
-            if (not self.validate_magic(magic) or
-                not self.validate_timestamp(timestamp, source_path)):
-                compiled_code = self.regenerate_bytecode(source_path,
-                                                            bytecode_path)
-        else:
-            if not self.validate_magic(magic):
-                raise ImportError("bad magic number")
-        exec compiled_code in module.__dict__
-        module.__file__ = file_path
-        module.__name__ = fullname
-        return module
\ No newline at end of file
+            bytecode = bytecode_file.read()
+        # XXX Need Python/marshal.c:r_long() to properly convert magic number
+        # and timestamp.
+        return magic, timestamp, bytecode
+        
+    def verify_timestamp(self, bytecode_timestamp, source_file):
+        """Verify that 'timestamp' is newer than the modification time for
+        'source_path'."""
+        return True
+        # XXX Won't work until can unmarshal longs.
+        source_path = self.get_location(source_file)
+        source_timestamp = os.stat(source_path).st_mtime
+        return source_timestamp <= bytecode_timestamp
+        
+    def get_code_from_source(self, source_file):
+        """Return the code object created from the source code file and the
+        timestamp on the source file."""
+        with contextlib.closing(source_file):
+            source_code = source_file.read()
+        source_location = self.get_location(source_file)
+        timestamp = os.stat(source_location).st_mtime
+        code_object = compile(source_code, self.get_location(source_file),
+                                'exec')
+        return code_object, timestamp
+        
+    def write_bytecode(self, code_object, bytecode_file, timestamp):
+        """Write out code_object to the file location bytecode_path with the
+        passed-in timestamp."""
+        # XXX w/o being able to marshal longs, we need to use py_compile.
+        with contextlib.closing(bytecode_file):
+            source_file = self.find_source_to_read(bytecode_file)
+            with contextlib.closing(source_file):
+                source_location = self.get_location(source_file)
+            bytecode_location = self.get_location(bytecode_file)
+        py_compile.compile(source_location, bytecode_location, doraise=True)
\ No newline at end of file

Modified: sandbox/trunk/import_in_py/test_importer.py
==============================================================================
--- sandbox/trunk/import_in_py/test_importer.py	(original)
+++ sandbox/trunk/import_in_py/test_importer.py	Tue Oct 17 21:20:50 2006
@@ -1,13 +1,18 @@
 from __future__ import with_statement
+import importer
+
 import unittest
 from test import test_support
-import importer
-import sys
-import StringIO
-import os
-import tempfile
+
+import contextlib
+import imp
+import marshal
 import new
+import os
 import py_compile
+import StringIO
+import sys
+import tempfile
 
 
 class BuiltinFrozen_Tester(unittest.TestCase):
@@ -121,78 +126,87 @@
             del sys.modules[self.module]
         except KeyError:
             pass
+        self.gen_source_and_bytecode()
+        self.module_object = new.module(self.module)
+
+
+    def tearDown(self):
+        """If the temporary path was used, make sure to clean up."""
+        if os.path.exists(self.source_path):
+            os.remove(self.source_path)
+        if os.path.exists(self.bytecode_path):
+            os.remove(self.bytecode_path)
+            
+    def gen_source(self):
+        """Generate a source code file."""
         self.directory = tempfile.gettempdir()
-        self.source_path = os.path.join(self.directory, self.module+'.py')
         self.attr_name = 'test_attr'
         self.attr_value = None
+        self.source_path = os.path.join(self.directory, self.module+'.py')
+        self.source = '%s = %r' % (self.attr_name, self.attr_value)
         with open(self.source_path, 'w') as py_file:
-            py_file.write('%s = %r' % (self.attr_name, self.attr_value))
-        py_compile.compile(self.source_path, doraise=True)
+            py_file.write(self.source)
+            
+    def gen_source_and_bytecode(self):
+        """Generate a bytecode file, which implicitly generates source file."""
+        self.gen_source()
         self.bytecode_path = self.source_path + ('c' if __debug__ else 'o')
-
-    def tearDown(self):
-        """If the temporary path was used, make sure to clean up."""
-        os.remove(self.source_path)
-        os.remove(self.bytecode_path)
+        py_compile.compile(self.source_path, doraise=True)
+        self.code_object = compile(self.source, self.bytecode_path, 'exec')
+        self.bytecode = marshal.dumps(self.code_object)
         
-    def verify_module(self, module, file_path):
+    def verify_module(self, module, file_path=None):
         """Verify that the module is the one created during setup and has the
         expected attributes and values."""
-        self.failUnlessEqual(module.__name__, self.module)
-        self.failUnlessEqual(module.__file__, file_path)
+        if file_path:
+            self.failUnlessEqual(module.__name__, self.module)
+            self.failUnlessEqual(module.__file__, file_path)
         self.failUnless(hasattr(module, self.attr_name))
         self.failUnlessEqual(getattr(module, self.attr_name), self.attr_value)
 
 
-class SourceHandlerTests(PyPycFileHelper):
+class FileSystemImporterTests(PyPycFileHelper):
+
+    """Test the filesystem importer."""
 
-    """Test the Python source code handler."""
-    
     def setUp(self):
-        """Create a handler to use for each test."""
-        super(self.__class__, self).setUp()
-        self.handler = importer.PySourceHandler()
-
-    def test_handle(self):
-        # Should claim it handles 'py' data.
-        self.failUnlessEqual(self.handler.handles, ('.py',))
+        """Create a basic importer."""
+        super(FileSystemImporterTests, self).setUp()
+        source_handler = importer.PyPycFileHandler(bytecode_handles=False)
+        self.importer = importer.FileSystemImporter(self.directory,
+                                                    source_handler)
 
-    def test_handle_file_module(self):
-        # Should be able to handle a module that is directly pointed at.
-        new_module = new.module(self.module)
-        self.handler.handle_file(new_module, self.module,
-                                  None, self.source_path)
-        self.verify_module(new_module, self.source_path)
+    def test_find_module_single_handler(self):
+        # Having a single handler should work without issue.
+        loader = self.importer.find_module(self.module)
+        self.failUnless(isinstance(loader, importer.FileSystemLoader))
+        self.failUnlessEqual(loader.file_path, self.source_path)
+        self.failUnless(isinstance(loader.handler, importer.PyPycFileHandler))
 
+    def test_find_module_cannot_find(self):
+        # Should return None if it can't find the module.
+        found = self.importer.find_module('gobbledeegook')
+        self.failUnlessEqual(found, None)
 
-class BytecodeHandlerTests(PyPycFileHelper):
-    
-    """Tests for the bytecode handler."""
-    
-    def setUp(self):
-        """Make sure that the module has its bytecode generated."""
-        super(self.__class__, self).setUp()
-        self.handler = importer.PyBytecodeHandler()
-        
-    def test_handles_attr(self):
-        # 'handles' should return 'pyc' or 'pyo' depending on __debug__.
-        try:
-            importer.__debug__ = True
-            handler = importer.PyBytecodeHandler()
-            self.failUnlessEqual(handler.handles, ('.pyc',))
-            importer.__debug__ = False
-            handler = importer.PyBytecodeHandler()
-            self.failUnlessEqual(handler.handles, ('.pyo',))
-        finally:
-            del importer.__debug__
-        
-    def test_handle_file(self):
-        # Should be able to handle a simple bytecode file that is freshly
-        # generated.
-        new_module = new.module(self.module)
-        self.handler.handle_file(new_module, self.module, None,
-                                    self.bytecode_path)
-        self.verify_module(new_module, self.bytecode_path)
+    def test_find_module_multiple_handlers(self):
+        # Modules should be found based on the order of the handlers.
+        source_handler = importer.PyPycFileHandler(bytecode_handles=False)
+        bytecode_handler = importer.PyPycFileHandler(source_handles=False)
+        fs_importer = importer.FileSystemImporter(self.directory,
+                                                  bytecode_handler, source_handler)
+        loader = fs_importer.find_module(self.module)
+        self.failUnless(isinstance(loader, importer.FileSystemLoader))
+        self.failUnlessEqual(loader.file_path, self.bytecode_path)
+        self.failUnless(isinstance(loader.handler, importer.PyPycFileHandler))
+
+    def test_find_to_load(self):
+        # Make sure that one can go from find_module() to getting a module
+        # imported.
+        loader = self.importer.find_module(self.module)
+        self.failUnless(loader)
+        module = loader.load_module(self.module)
+        self.verify_module(module, self.source_path)
+        self.failUnlessEqual(module, sys.modules[self.module])
 
 
 class FileSystemLoaderTests(PyPycFileHelper):
@@ -201,9 +215,11 @@
 
     def setUp(self):
         """Create a fresh loader per run."""
-        super(self.__class__, self).setUp()
+        super(FileSystemLoaderTests, self).setUp()
+        source_handler = importer.PyPycFileHandler(bytecode_handles=False)
         self.loader = importer.FileSystemLoader(self.source_path,
-                                                importer.PySourceHandler())
+                                                source_handler,
+                                                source_handler.handles[0])
 
     def test_load_module_fresh(self):
         # Test a basic module load where there is no sys.modules entry.
@@ -214,62 +230,367 @@
     def test_load_module_sys_modules(self):
         # Make sure that the loader returns the module from sys.modules if it
         # is there.
-        new_module = new.module(self.module)
-        sys.modules[self.module] = new_module
+        sys.modules[self.module] = self.module_object
         loaded_module = self.loader.load_module(self.module)
-        self.failUnless(loaded_module is new_module)
+        self.failUnless(loaded_module is self.module_object)
         
-        
-class FileSystemImporterTests(PyPycFileHelper):
+    def test_sys_module_cleared_on_error(self):
+        # Any entry made for module into sys.modules should be cleared upon error.
+        class RaiseErrorHandler(object):
+            def handle_code(*args):
+                raise ImportError
+                
+        loader = importer.FileSystemLoader(self.source_path, RaiseErrorHandler(), 'A')
+        try:
+            loader.load_module(self.module)
+        except ImportError:
+            self.failUnless(self.module not in sys.modules)
+
+
+class PyPycBaseHandlerTests(PyPycFileHelper):
     
-    """Test the filesystem importer."""
+    """Test py/pyc base handler class."""
     
     def setUp(self):
-        """Create a basic importer."""
-        super(self.__class__, self).setUp()
-        self.importer = importer.FileSystemImporter(self.directory,
-                                                    importer.PySourceHandler())
+        """Create a basic handler instance."""
+        super(PyPycBaseHandlerTests, self).setUp()
+        self.handler = importer.PyPycBaseHandler(False, False)
     
-    def test_find_module_single_handler(self):
-        # Having a single handler should work without issue.
-        loader = self.importer.find_module(self.module)
-        self.failUnless(isinstance(loader, importer.FileSystemLoader))
-        self.failUnlessEqual(loader.file_path, self.source_path)
-        self.failUnless(isinstance(loader.handler, importer.PySourceHandler))
+    def test_init(self):
+        # Make sure initialization does what it needs to do.
+        self.failUnlessEqual(self.handler.handles, tuple())
+        handler = importer.PyPycBaseHandler(('B',), ('A',))
+        self.failUnlessEqual(handler.handles, ('A', 'B'))
+        handler= importer.PyPycBaseHandler(('A',), False)
+        self.failUnlessEqual(handler.handles, ('A',))
+        handler = importer.PyPycBaseHandler(False, ('A',))
+        self.failUnlessEqual(handler.handles, ('A',))
         
-    def test_find_module_cannot_find(self):
-        # Should return None if it can't find the module.
-        found = self.importer.find_module('gobbledeegook')
-        self.failUnlessEqual(found, None)
+    def test_not_implemented(self):
+        # The test methods should all raise NotImplementedError
+        one_argument = (self.handler.get_location, self.handler.get_bytecode,
+                        self.handler.get_code_from_source)
+        for method in one_argument:
+            self.failUnlessRaises(NotImplementedError, method, None)
+        self.failUnlessRaises(NotImplementedError,
+                                self.handler.verify_timestamp, None, None)
+        self.failUnlessRaises(NotImplementedError,
+                                self.handler.write_bytecode, None, None, None)
         
-    def test_find_module_multiple_handlers(self):
-        # Modules should be found based on the order of the handlers.
-        fs_importer = importer.FileSystemImporter(self.directory,
-                                                  importer.PyBytecodeHandler(),
-                                                  importer.PySourceHandler())
-        loader = fs_importer.find_module(self.module)
-        self.failUnless(isinstance(loader, importer.FileSystemLoader))
-        self.failUnlessEqual(loader.file_path, self.bytecode_path)
-        self.failUnless(isinstance(loader.handler, importer.PyBytecodeHandler))
+    def test_find_source_to_read(self):
+        # Should return None no matter what the arguments.
+        self.failUnlessEqual(self.handler.find_source_to_read(None), None)
         
-    def test_find_to_load(self):
-        # Make sure that one can go from find_module() to getting a module
-        # imported.
-        loader = self.importer.find_module(self.module)
-        self.failUnless(loader)
-        module = loader.load_module(self.module)
-        self.verify_module(module, self.source_path)
-        self.failUnlessEqual(module, sys.modules[self.module])
- 
+    def test_find_bytecode_to_write(self):
+        # Should return None regardless of its argument.
+        self.failUnlessEqual(self.handler.find_bytecode_to_write(None), None)
+
+    def test_get_code_from_bytecode(self):
+        # Make sure you can parse a bytecode file.
+        handler = importer.PyPycBaseHandler(False, False)
+        code_object = handler.get_code_from_bytecode(self.bytecode)
+        exec code_object in self.module_object.__dict__
+        self.verify_module(self.module_object)
+        
+    def test_verify_magic(self):
+        # Test checking the magic number for bytecode.
+        return # XXX
+        self.failUnless(self.handler.verify_magic(imp.get_magic()))
+        self.failUnless(not self.handler.verify_magic(imp.get_magic()-1))
+        
+    def test_invalid_handler(self):
+        # If handle_code is called with a handler that is not registered, 
+        # ImportError should be raised.
+        self.failUnlessRaises(ImportError, self.handler.handle_code,
+                                None, None, 'A')
+        
+    def test_source_only_API(self):
+        # If only a source handle is registered, make sure bytecode-related
+        # methods are not called.
+        class SourceOnlyAPITester(importer.PyPycBaseHandler):
+            def __init__(self, tester, source_handles, bytecode_handles):
+                super(SourceOnlyAPITester, self).__init__(source_handles,
+                                                        bytecode_handles)
+                self.tester = tester
+            def verify_magic(self, ignore):
+                """Normally implemented, but should not be called."""
+                raise NotImplementedError
+            def find_bytecode_to_write(self, ignore):
+                """Normally returns None, but should not be called."""
+                raise NotImplementedError
+            def get_code_from_bytecode(self, ignore):
+                """Normally implemented, but should not be called."""
+                raise NotImplementedError
+            def get_code_from_source(self, ignore):
+                return self.tester.code_object, 0
+            
+        handler = SourceOnlyAPITester(self, ('A',), False)
+        handler.handle_code(self.module_object, None, 'A')
+        self.verify_module(self.module_object)
+        
+    def test_bytecode_only_API(self):
+        # If only bytecode handle registered then only bytecode-related
+        # methods should be called.
+        class BytecodeOnlyAPITester(importer.PyPycBaseHandler):
+            def verify_timestamp(self, ignore, ignore2):
+                return True
+            def get_bytecode(self, bytecode):
+                return imp.get_magic(), 0, bytecode
+                
+        handler = BytecodeOnlyAPITester(False, ('A',))
+        handler.handle_code(self.module_object, self.bytecode, 'A')
+        self.verify_module(self.module_object)
+        
+    def test_bad_magic(self):
+        # If bytecode returns a bad magic number and there is no corresponding
+        # source, then ImportError should be raised.
+        class Tester(importer.PyPycBaseHandler):
+            def get_bytecode(self, bytecode):
+                return 0, 0, bytecode
+            def verify_magic(self, ignore):
+                return False
+                
+        handler = Tester(False, ('A',))
+        self.failUnlessRaises(ImportError, handler.handle_code,
+                                self.module_object, self.bytecode, 'A')
+                                
+    def test_bad_timestamp(self):
+        # If bytecode fails timestamp, source should be used.
+        class Tester(importer.PyPycBaseHandler):
+            def get_bytecode(self, ignore):
+                return 0, 0, 0
+            def verify_magic(self, ignore):
+                return True
+            def verify_timestamp(self, ignore, ignore2):
+                return False
+            def find_source_to_read(self, ignore):
+                """Assign code_object after instantiation."""
+                return self.code_object
+            def find_bytecode_to_write(self, ignore):
+                return 'bytecode found'
+            def get_code_from_source(self, code_object):
+                return code_object, 0
+            def write_bytecode(self, code_object, bytecode, timestamp):
+                if bytecode != 'bytecode found':
+                    raise ValueError
+                if self.code_object is not code_object:
+                    raise ValueError
+                if timestamp != 0:
+                    raise ValueError
+                    
+        handler = Tester(('B',), ('A',))
+        handler.code_object = self.code_object
+        handler.handle_code(self.module_object, None, 'A')
+        self.verify_module(self.module_object)
+        
+        # Test that if bytecode cannot be found, write_bytecode is not called.
+        # Also verify that get_location is called and result is used for
+        # module.__file__ .
+        Tester.find_bytecode_to_write = lambda ignore, ignore2: None
+        Tester.write_bytecode = lambda ignore, ignore2, ignore3, ignore4: 1 / 0
+        Tester.get_location = lambda ignore, ignore2: 'get_location'
+        new_module = new.module(self.module)
+        handler.handle_code(new_module, None, 'A')
+        self.failUnlessEqual(new_module.__file__, 'get_location')
+        self.verify_module(new_module)
+        
+    def test_write_bytecode(self):
+        # If source is passed in, then bytecode should be written out.
+        # Writing out of bytecode when it was invalid is tested in
+        # test_bad_timestamp.
+        class Tester(importer.PyPycBaseHandler):
+            def find_bytecode_to_write(self, ignore):
+                return 'find_bytecode'
+            def get_code_from_source(self, code_object):
+                """Knowing that this method's return value is passed to
+                write_bytecode, store away for later comparison."""
+                self.code_object = code_object
+                return code_object, 0
+            def write_bytecode(self, code_object, bytecode_loc, timestamp):
+                if bytecode_loc != 'find_bytecode':
+                    raise ValueError
+                if code_object is not self.code_object:
+                    raise ValueError
+                if timestamp != 0:
+                    raise ValueError
+                    
+        handler = Tester(('A',), ('B',))
+        handler.handle_code(self.module_object, self.code_object, 'A')
+        self.verify_module(self.module_object)
+
+
+class PyPycFileHandlerTests(PyPycFileHelper):
+    
+    """Test the py/pyc filesystem handler."""
+    
+    def setUp(self):
+        """Create an instance that can handle source and bytecode."""
+        super(PyPycFileHandlerTests, self).setUp()
+        self.handler = importer.PyPycFileHandler()
+        
+    def test_init(self):
+        # Make sure 'handles' ends up being set properly.
+        expected_bytecode_ext = '.pyc' if __debug__ else '.pyo'
+        self.failUnlessEqual(self.handler.handles,
+                                (expected_bytecode_ext, '.py'))
+        handler = importer.PyPycFileHandler(('A',))
+        self.failUnlessEqual(handler.handles,
+                                (expected_bytecode_ext, 'A'))
+        handler = importer.PyPycFileHandler(bytecode_handles=('A',))
+        self.failUnlessEqual(handler.handles, ('A', '.py'))
+        handler = importer.PyPycFileHandler(('A',), ('B',))
+        self.failUnlessEqual(handler.handles, ('B', 'A'))
+        
+    def test_get_location(self):
+        # Should return the value on the 'name' attribute of its argument.
+        class Tester(object):
+            name = 42
+        self.failUnlessEqual(self.handler.get_location(Tester), Tester.name)
+        
+    def test_get_code_from_source(self):
+        # Should be able to read from a file object and return a code object.
+        with open(self.source_path, 'rU') as source_file:
+            result = self.handler.get_code_from_source(source_file)
+        code_object, timestamp = result
+        exec code_object in self.module_object.__dict__
+        self.verify_module(self.module_object)
+        source_timestamp = os.stat(self.source_path).st_mtime
+        self.failUnlessEqual(timestamp. source_timestamp)
+        
+    def test_find_source_to_read(self):
+        # Should be able to deduce .py file from .pyc file.
+        with open(self.bytecode_path, 'rb') as bytecode_file:
+            source_file = self.handler.find_source_to_read(bytecode_file)
+            with contextlib.closing(source_file):
+                source_file_path = source_file.name
+        self.failUnlessEqual(source_file_path, self.source_path)
+        
+    def test_find_bytecode_to_write(self):
+        # Should be able to deduce .pyc file from .py file.
+        with open(self.source_path, 'U') as source_file:
+            bytecode_file = self.handler.find_bytecode_to_write(source_file)
+            with contextlib.closing(bytecode_file):
+                bytecode_file_path = bytecode_file.name
+        self.failUnlessEqual(bytecode_file_path, self.bytecode_path)
+        
+    def test_get_bytecode(self):
+        # Magic number should be good
+        with open(self.bytecode_path, 'rb') as bytecode_file:
+            result = self.handler.get_bytecode(bytecode_file)
+        magic, timestamp, bytecode = result
+        # XXX self.failUnlessEqual(magic, imp.get_magic())
+        source_timestamp = os.stat(self.source_path).st_mtime
+        # XXX self.failUnlessEqual(timestamp, source_timestamp)
+        code_object = marshal.loads(bytecode)
+        exec code_object in self.module_object.__dict__
+        self.verify_module(self.module_object)
+        
+    def test_verify_timestamp(self):
+        source_timestamp = os.stat(self.source_path).st_mtime
+        with open(self.source_path, 'U') as source_file:
+            result = self.handler.verify_timestamp(source_timestamp,
+                                                    source_file)
+            self.failUnless(result)
+        
+    def test_get_code_from_source(self):
+        with open(self.source_path, 'U') as source_file:
+            result = self.handler.get_code_from_source(source_file)
+        code_object, timestamp = result
+        exec code_object in self.module_object.__dict__
+        self.verify_module(self.module_object)
+        source_timestamp = os.stat(self.source_path).st_mtime
+        self.failUnlessEqual(timestamp, source_timestamp)
+        
+    def test_write_bytecode(self):
+        # Writing out the bytecode file should have the current magic number,
+        # a timestamp of the source file, and correct bytecode.
+        os.remove(self.bytecode_path)
+        timestamp = os.stat(self.source_path).st_mtime
+        with open(self.bytecode_path, 'wb') as bytecode_file:
+            self.handler.write_bytecode(self.code_object, bytecode_file,
+                                        timestamp)
+        # Verify bytecode file was created.
+        self.failUnless(os.path.exists(self.bytecode_path))
+        with open(self.bytecode_path, 'rb') as bytecode_file:
+            result = self.handler.get_bytecode(bytecode_file)
+        magic, timestamp, bytecode = result
+        # Verify magic number.
+        self.failUnless(self.handler.verify_magic(magic))
+        # Verify timestamp.
+        with open(self.source_path, 'U') as source_file:
+            self.failUnless(self.handler.verify_timestamp(timestamp,
+                                                            source_file))
+        # Verify bytecode.
+        code_object = self.handler.get_code_from_bytecode(bytecode)
+        exec code_object in self.module_object.__dict__
+        self.verify_module(self.module_object)
+        
+    def test_handle_code_source(self):
+        # Should be able to initialize the module from just using the source.
+        os.remove(self.bytecode_path)
+        handler = importer.PyPycFileHandler(bytecode_handles=False)
+        with open(self.source_path, 'U') as source_file:
+            handler.handle_code(self.module_object, source_file, '.py')
+        self.verify_module(self.module_object)
+        self.failUnless(not os.path.exists(self.bytecode_path))
+        
+    def test_handle_code_bytecode(self):
+        # Should be able to initialize the module with just the bytecode.
+        os.remove(self.source_path)
+        bytecode_extension = os.path.splitext(self.bytecode_path)[1]
+        handler = importer.PyPycFileHandler(source_handles=False)
+        with open(self.bytecode_path, 'rb') as bytecode_file:
+            handler.handle_code(self.module_object, bytecode_file,
+                                bytecode_extension)
+        self.verify_module(self.module_object)
+        
+    def test_handle_code_bad_bytecode_timestamp_good_source(self):
+        # If the timestamp fails on the bytecode, use the source and recreate
+        # the bytecode.
+        class Tester(importer.PyPycFileHandler):
+            """On some platforms the resolution of the last modification time
+            can be too coarse for rewriting the source to pick it up.  Thus
+            force a fail timestamp check."""
+            def verify_timestamp(self, ignore, ignore2):
+                return False
+        
+        handler = Tester()
+        bytecode_extension = os.path.splitext(self.bytecode_path)[1]
+        with open(self.bytecode_path, 'rb') as bytecode_file:
+            bytecode_stringio = StringIO.StringIO(bytecode_file.read())
+        bytecode_stringio.name = self.bytecode_path
+        # Once bytecode has been read, don't need the file.  Deleting it
+        # allows for easy detection that the bytecode was recreated.
+        os.remove(self.bytecode_path)
+        handler.handle_code(self.module_object, bytecode_stringio,
+                            bytecode_extension)
+        self.verify_module(self.module_object)
+        self.failUnless(os.path.exists(self.bytecode_path))
+
+        with open(self.bytecode_path) as bytecode_file:
+            self.handler.handle_code(self.module_object, bytecode_file,
+                                bytecode_extension)
+        self.verify_module(self.module_object)
+        self.failUnless(os.path.exists(self.bytecode_path))
+        
+    def test_handle_code_good_source_write_bytecode(self):
+        # If the handler is requested to handle source code and bytecode can
+        # be written, then do so.
+        os.remove(self.bytecode_path)
+        with open(self.source_path, 'U') as source_file:
+            self.handler.handle_code(self.module_object, source_file, '.py')
+        self.verify_module(self.module_object)
+        self.failUnless(os.path.exists(self.bytecode_path))
+
 
 def test_main():
     test_support.run_unittest(
                 BuiltinImporterTests,
                 FrozenImporterTests,
-                SourceHandlerTests,
-                BytecodeHandlerTests,
                 FileSystemLoaderTests,
                 FileSystemImporterTests,
+                PyPycBaseHandlerTests,
+                PyPycFileHandlerTests,
             )
 
 


More information about the Python-checkins mailing list