[Patches] [ python-Patches-416704 ] More robust freeze

noreply@sourceforge.net noreply@sourceforge.net
Wed, 27 Nov 2002 14:41:47 -0800


Patches item #416704, was opened at 2001-04-17 16:24
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=416704&group_id=5470

Category: None
Group: None
Status: Open
Resolution: Accepted
Priority: 7
Submitted By: Toby Dickenson (htrd)
Assigned to: Nobody/Anonymous (nobody)
Summary: More robust freeze

Initial Comment:
This patch addresses three issues, all relating to 
robustness of frozen programs.

Specifically, this patch allows explicit and complete 
control over which modules may be loaded from source 
on the filesystem of the host system where the frozen 
program is run, and which may not.

Without this patch it is impossible to create a non-
trivial frozen program which will *never* load a 
module from source on the filesystem.



1. A patch to correct bug #404545 (frozen package 
import uses wrong files). Under this change, 
submodules of a frozen package must themselves be 
frozen modules. Previously, the import machinery may 
also try to import submodules from curiously named 
files (packagename.modulename.py) from directories in 
sys.path


2. A patch to add an extra command line option -E to 
freeze.py, which forces freeze to terminate with an 
error message if there are modules that it can not 
locate.

If this switch is not specified then the default 
behaviour is unchanged: modules which can not be found 
by freeze will not be included in the frozen program, 
and the import machinery will try to load them from 
source on sys.path when the frozen program is run.

In practice we have found that a missing module is 
probably an error (and it is a fairly frequent error 
too!). The -E switch can be used to detect this error; 
any missing modules will cause freeze.py to fail.

In the rare case of a frozen module importing a non-
frozen one (ie one which should be loaded from source 
when the program is run), the non-frozen module must 
be excluded from the freeze using the -x option.


3. A patch to add an extra command line option -X to 
freeze.py, which indicates that a specified module is 
excluded from the freeze, and also that the frozen 
program should not try to load the module from 
sys.path when it is imported. Importing the specified 
module will always trigger an ImportError.

This is useful if a module used by a frozen program 
can optionally use a submodule...
try:
    import optional_submodule
except ImportError:
    pass

It may be preferable for the frozen program's 
behaviour to not depend on whether optional_submodule 
happens to be installed on the host system, and that 
the 'import optional_submodule' should always fail 
with an ImportError. This can be achieved using the '-
X optional_submodule' command line switch to freeze.py

This is implemented by including the excluded module 
in the frozen imports table (_PyImport_FrozenModules), 
with the code pointer set to NULL.




----------------------------------------------------------------------

>Comment By: Just van Rossum (jvr)
Date: 2002-11-27 23:41

Message:
Logged In: YES 
user_id=92689

Thought about this pretty hard: there's no way around this,
after all if mypackage *does* have a subpackage called
mypackage, it's the correct thing to do to look for
mypackage.mypackage.submodule.

I still think it's a useful feature to allow "funny names"
(it's also needed if you want/need a builtin module as a
submodule). It's easy to work around subversion if you must:
set sys.path to an empty list; it's then guaranteed that no
file system imports will happen. If you _do_ need a
non-empty sys.path you're screwed if there's any code in the
project that catches ImportError for optional modules.

I guess we need a neutral third party, as it doesn't look
like we're going to agree :-/

(Btw. I tried to empty sys.path in a custom site.py module,
but Python thinks it has to be smart and sets it to ['.']
after site.py has run, so you'd have to do this in your main
program.)

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-11-27 20:36

Message:
Logged In: YES 
user_id=92689

Ok, I've got it!! Subversion can occur if the import happens
inside a package. Situation: mypackage/__init__.py does
"import mypackage.submodule". Python will then first try to
import mypackage.mypackage.submodule, in case it's a
relative import. *This* is the situation which can be
subverted in frozen programs if "funny names" are allowed. I
will look into it, I'll see if I can find a way around it.
Thanks for your input!

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2002-11-27 19:24

Message:
Logged In: YES 
user_id=46460

I have been working with your antipatch applied to Python
2.2.1. 

That original test no longer works as-is. It includes a file
b.n.py which should never be imported, but used to be picked
up incorrectly by the frozen interpreter. Today, bad things
happen if I move that file to a.b.py. Same effect -
different filename. 

(Im not sure why the change has happened.)

This effect is easy to see using strace on linux. The
interpreter tries to load modules from many paths before
looking in the frozen module table. The behavior of my
frozen program would change should any of these files exist.

I think we have a fundamental incompatability here, because
this is exactly your desired behaviour.  You want a frozen
"import b" in a module in a package called "a" to be able to
load the modules "a.b" from an external file with a funny
name. I want a frozen program to *never* look outside the
frozen modules. I think both requirements are valid. Do we
need a new flag to control this behavior?

Sample strace output from my original test case:

stat64("./a.b", 0xbfffde8c)             = -1 ENOENT (No such
file or directory)
open("./a.b.so", O_RDONLY|O_LARGEFILE)  = -1 ENOENT (No such
file or directory)
open("./a.bmodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No
such file or directory)
open("./a.b.py", O_RDONLY|O_LARGEFILE)  = -1 ENOENT (No such
file or directory)
open("./a.b.pyc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such
file or directory)
stat64("/home/tdickenson/projects/GeminiPythonProjects/a.b",
0xbfffde8c) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/projects/GeminiPythonProjects/a.b.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/projects/GeminiPythonProjects/a.bmodule.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/projects/GeminiPythonProjects/a.b.py",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/projects/GeminiPythonProjects/a.b.pyc",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
stat64("/home/tdickenson/lib/python/a.b", 0xbfffde8c) = -1
ENOENT (No such file or directory)
open("/home/tdickenson/lib/python/a.b.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/lib/python/a.bmodule.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/lib/python/a.b.py",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/lib/python/a.b.pyc",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
stat64("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/a.b",
0xbfffde8c) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/a.b.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/a.bmodule.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/a.b.py",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/a.b.pyc",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
stat64("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/plat-linux2/a.b",
0xbfffde8c) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/plat-linux2/a.b.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/plat-linux2/a.bmodule.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/plat-linux2/a.b.py",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/plat-linux2/a.b.pyc",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
stat64("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-tk/a.b",
0xbfffde8c) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-tk/a.b.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-tk/a.bmodule.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-tk/a.b.py",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-tk/a.b.pyc",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
stat64("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-dynload/a.b",
0xbfffde8c) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-dynload/a.b.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-dynload/a.bmodule.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-dynload/a.b.py",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/lib-dynload/a.b.pyc",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
stat64("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/site-packages/a.b",
0xbfffde8c) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/site-packages/a.b.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/site-packages/a.bmodule.so",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/site-packages/a.b.py",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/home/tdickenson/Python-2.2.1/inst/lib/python2.2/site-packages/a.b.pyc",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)


 

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2002-11-27 17:15

Message:
Logged In: YES 
user_id=46460

No progress, and sorry for not saying so earlier. I hope to squeeze it in  
tonight, and this will be my last chance in the next 2 weeks  
   
There were definitely no __import__ hooks, or other kind of trickery.  

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2002-11-27 17:10

Message:
Logged In: YES 
user_id=11105

> Thomas, are you tracking this?
I'm reading this with interest. Backwards compatibility to 2.2 
may prevent my trick from being taken out again ;-)

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-11-27 16:33

Message:
Logged In: YES 
user_id=92689

Toby, any progress on getting a reproducible test case?
Since it can't be reproduced in simple cases I strongly
suspect that an __import__ hook was involved in the setup
that triggered your problem. Can you confirm or deny that?
If so: I think the import hook should be made more robust,
not the frozen import mechanism.

I know my problem can be worked around in pure Python(*),
but I'd like to avoid that. You also ask "Is there a reason
why the extension needs to be dynamically loaded"; well, I'm
not having a problem with a single product, I'm writing a
general purpose freezing tool written in pure Python, using
a stock interpreter executable. Look ma, no compiler!

*) Thomas Heller uses a different trick in his development
version of py2exe, but I'm sure he'd be glad to be able to
take that out again. Thomas, are you tracking this?

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-11-25 23:09

Message:
Logged In: YES 
user_id=92689

Guido, you just happened to still be assigned to the
patch... I've unassigned it.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-11-25 21:15

Message:
Logged In: YES 
user_id=6380

Please leave me out of this. I have unfortunately no time
left for patch triage.

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-11-25 11:41

Message:
Logged In: YES 
user_id=92689

You write "yes this really happened". I would like to
believe you, but I'd like to understand *how* it can happen
(I tried hard to _make_ it happen an failed ;-). If a.b
exists as a frozen module, I really don't see how a.b.py
could ever be found before it. Can you elaborate?

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2002-11-25 09:21

Message:
Logged In: YES 
user_id=46460

Changing this behaviour would make it difficult to write frozen programs 
whose behaviour is predictable - where the behaviour can not be subverted 
by files with funny names. This is important for: 
1. suid frozen programs (although there are other ways around this) 
2. vendors who ship frozen programs, who dont want it to break if the 
user happens to have a funny named file in the current directory (yes, this 
really happened.) 
 
It is currently possible to work around this new problem in pure python, by 
importing the extension module packagelessly, and twiddling sys.modules to 
add it to the package. I think this solution is well known in the python/COM 
world, where there are many extension modules in the win32com package. 
The following example hack is taken from the top of one of our frozen 
products: 
 
if pythoncom.frozen: 
    # A bug in freeze causes problems for extension modules that originally 
appeared in packages. 
    # They can only be imported packagelessly. 
    def _fix_frozen_extensions(package_name,extension_name): 
        extension = sys.modules[package_name+'.'+extension_name] = 
__import__(extension_name) 
        __import__(package_name) 
        setattr(sys.modules[package_name],extension_name,extension) 
    _fix_frozen_extensions('GeminiDataLoggers.Cedar','SimplePort') 
    _fix_frozen_extensions('GeminiDataLoggers.Cedar','SimpleMAPI') 
    _fix_frozen_extensions('GeminiDataLoggers.Cedar','WinINet') 
    _fix_frozen_extensions('win32com.axcontrol','axcontrol') 
 
    
Is there a reason why the extension needs to be dynamically loaded, rather       

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-11-24 22:41

Message:
Logged In: YES 
user_id=92689

Reopening this patch, adding a counterpatch. (Now I'm sorry
I didn't monitor this issue when it came up!)

I'd like to undo part of T.Dickensons' patch for 2.3 as I
think it's a mistake. For one, it breaks a patch of my own
from 1998, which Guido back then checked in as rev. 2.116.

The reasons I want it out again are twofold:
1) it removes a useful (if not *needed*) feature
2) it fixes a problem that is not reproducible.

Currently it is not possible to use a builtin module or a
dynamically loaded extension as a submodule of a frozen
package. While this is perhaps a dubious habit, it _is_
allowed in a non-frozen world and there _are_ packages out
there that employ this. My counterpatch makes this possible
again.

Guido's checkin msg of rev. 2.116 of import.c contains this:
"""(I *think* this means that we can now have a built-in
module bar that's a submodule of a frozen package foo, by
registering the built-in module with a name "foo.bar" in the
table of builtin modules.)"""
He is correct ;-) But the effect is broader: it works with
_any_ module type. So yes, .py files with dotted names are
also found, but _only_ if no frozen module by that same name
exists. I don't even think it's a wart, let alone a bug:
it's a feature.

See also the comment I just added to bug #404545

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 21:15

Message:
Logged In: YES 
user_id=6380

Thanks. All checked in!

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 20:54

Message:
Logged In: YES 
user_id=6380

I checked in the first part (import.c).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-16 23:35

Message:
Logged In: YES 
user_id=6380

I'll try to look at this now!

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-10-03 12:39

Message:
Logged In: YES 
user_id=46460

Still no patch there. Maybe this time?

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-10-03 12:37

Message:
Logged In: YES 
user_id=46460

Once again, including the patch this time.

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-10-03 12:34

Message:
Logged In: YES 
user_id=46460

Attached is an updated patch against the current CVS.

I have verified each aspect of this patch with 
Py_VerboseFlag set to 2, so that import.c traces out which 
files it is checking during the import process.

(Im not aware of an easy way to set Py_VerboseFlag to 2 in 
a frozen program.... Ive added a new patch #467455 to 
enhance the PYTHONVERBOSE environment variable to support 
this)

I can confirm that the current CVS (rougly 2.2a4) still 
demonstrates the problem from bug #404545. The details are 
slightly different to my original bug report; Im not sure 
if this is due to an incidental change in python since 
1.5.2, or if I messed up that bug report. Anyway, using 
PyVerbose=2 clearly shows that before this patch it is 
looking for files it shouldnt (and it uses those files if 
they exist). After this patch, it definitely only looks for 
the right files.

The other aspect of this patch, adding -E and -X to 
freeze.py, is exactly as before.


----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-09-10 16:20

Message:
Logged In: YES 
user_id=46460

OK, I should get round to this soon.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-05 17:50

Message:
Logged In: YES 
user_id=6380

I'm willing to look at this, but right now the patch doesn't
apply cleanly. From the headers in the diff file it looks
like you're patching an early beta for 2.0.

If you can rework this for current CVS or Python 2.2a2 or
2.2a3, that would be great.

----------------------------------------------------------------------

Comment By: Martin v. L÷wis (loewis)
Date: 2001-06-08 00:26

Message:
Logged In: YES 
user_id=21627

Why is this assigned to Mark? I cannot see anything 
windows-specific in it. Mark, if you are not interested in 
reviewing this patch, I recommend to unassign this from 
yourself.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=416704&group_id=5470