Re: [Python-Dev] __file__
On Feb 03, 2010, at 12:42 PM, Brett Cannon wrote:
So what happens when only bytecode is present?
We discussed this at Pycon and agreed that we will not support source-less deployments by default. The source file must exist or it will be an ImportError. This does not mean source-less deployments are not possible though. To support this use case, you'd have to write a custom import hook. We may write one as part of the PEP 3147 implementation. Contributions are of course welcome! -Barry
Barry Warsaw wrote:
We discussed this at Pycon and agreed that we will not support source-less deployments by default. The source file must exist or it will be an ImportError.
This does not mean source-less deployments are not possible though. To support this use case, you'd have to write a custom import hook.
What???? I don't like this idea at all. I object to being forced to jump through an obscure hoop to do something that's been totally straightforward until now. -- Greg
On 25/02/2010 23:56, Greg Ewing wrote:
Barry Warsaw wrote:
We discussed this at Pycon and agreed that we will not support source-less deployments by default. The source file must exist or it will be an ImportError.
This does not mean source-less deployments are not possible though. To support this use case, you'd have to write a custom import hook.
What????
I don't like this idea at all. I object to being forced to jump through an obscure hoop to do something that's been totally straightforward until now.
I thought we agreed at the language summit that if a .pyc was in the place of the source file it *could* be imported from - making pyc only distributions possible. As the pyc files are in the __pycache__ (or whatever) directory by default they *won't* be importable without the source files. A pyc only distribution can easily be created though with this scheme. Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Thu, Feb 25, 2010 at 3:50 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 25/02/2010 23:56, Greg Ewing wrote:
Barry Warsaw wrote:
We discussed this at Pycon and agreed that we will not support source-less deployments by default. The source file must exist or it will be an ImportError.
This does not mean source-less deployments are not possible though. To support this use case, you'd have to write a custom import hook.
What????
I don't like this idea at all. I object to being forced to jump through an obscure hoop to do something that's been totally straightforward until now.
I thought we agreed at the language summit that if a .pyc was in the place of the source file it *could* be imported from - making pyc only distributions possible. As the pyc files are in the __pycache__ (or whatever) directory by default they *won't* be importable without the source files. A pyc only distribution can easily be created though with this scheme.
That's also my recollection. Basically, for .pyc-only modules, nothing changes. PS. I still prefer __compiled__ over __cached__ but I don't feel strong about it.
Michael
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog
READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Thu, Feb 25, 2010 at 16:13, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
Michael Foord wrote:
I thought we agreed at the language summit that if a .pyc was in the place
of the source file it *could* be imported from - making pyc only distributions possible.
Ah, that's okay, then. Sorry about the panic!
Michael is right about what as discussed at the language summit, but Barry means what he says; if you look at the PEP as it currently stands it does not support bytecode-only modules. Barry and I discussed how to implement the PEP at PyCon after the summit and supporting bytecode-only modules quickly began to muck with the semantics and made it harder to explain (i.e. what to set __file__ vs. __compiled__ based on what is or is not available and how to properly define get_paths for loaders). But a benefit of no longer supporting bytecode-only modules by default is it cuts back on possible stat calls which slows down Python's startup time (a complaint I hear a lot). Performance issues become even more acute if you try to come up with even a remotely proper way to have backwards-compatible support in importlib for its ABCs w/o forcing caching on all implementors of the ABCs. As for having a dependency on a loader, I don't see how that is obscure; it's just a dependency your package has that you handle at install-time. And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know. Bytecode contains so much data that disassembling it gives you a very clear picture of what the original code was like. I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else. The only perk I can see is space-saving, but that's dangerous as that ties you to a specific VM with a specific magic number (let alone that it leads to people tying themselves to CPython and ignoring the other VMs that simply do not support bytecode). -Brett
-- Greg
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Fri, Feb 26, 2010 at 2:09 PM, Brett Cannon <brett@python.org> wrote:
And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know. Bytecode contains so much data that disassembling it gives you a very clear picture of what the original code was like. I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else.
Byte-code only wasn't always supported. We added it knowing full well it had all those problems (plus, it locks in the Python version), simply because a certain class of developers won't stop asking for it. Their users are apparently too dumb to decode bytecode but smart enough to read source code, even if they don't understand it, and this knowledge could hurt them. Presumably users smart enough to decode bytecode will know enough not to hurt themselves. Maybe Greg's and my response to the mention of dropping this feature is too strong -- after all we're both dinosaurs. And maybe the developers who want the feature can write their own loader. But given that this feature takes an entirely different path through import.c anyway, I still don't see how dropping it is necessary in order to implement the PEP. If you have separate motivation to drop the feature, you should deprecate it properly. -- --Guido van Rossum (python.org/~guido)
On Fri, Feb 26, 2010 at 14:29, Guido van Rossum <guido@python.org> wrote:
On Fri, Feb 26, 2010 at 2:09 PM, Brett Cannon <brett@python.org> wrote:
And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know. Bytecode contains so much data that disassembling it gives you a very clear picture of what the original code was like. I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else.
Byte-code only wasn't always supported. We added it knowing full well it had all those problems (plus, it locks in the Python version), simply because a certain class of developers won't stop asking for it. Their users are apparently too dumb to decode bytecode but smart enough to read source code, even if they don't understand it, and this knowledge could hurt them. Presumably users smart enough to decode bytecode will know enough not to hurt themselves.
Maybe it should be made optional much like the talk of frozen modules eventually becoming an optional thing.
Maybe Greg's and my response to the mention of dropping this feature is too strong -- after all we're both dinosaurs. And maybe the developers who want the feature can write their own loader.
We could also provide if necessary.
But given that this feature takes an entirely different path through import.c anyway, I still don't see how dropping it is necessary in order to implement the PEP.
It's not necessary at all. I think what Barry was going for was simply cleaning up semantics once instead of having to drag it out.
If you have separate motivation to drop the feature, you should deprecate it properly.
Fine by me. It would be easy enough to raise ImportWarning in the bytecode-only case if Barry decides to push for this. Here is a question for Barry to think about if he decides to move forward with all of this: would mixed support for both bytecode-only and source/bytecode be required for the same directory, or could it be one or the other but not both? Differing semantics based on what is found in the directory would make the path hook more expensive (which is a one-time cost per directory), but it would cut stat calls in the finder in half (which is a cost made per import). -Brett
-- --Guido van Rossum (python.org/~guido)
On approximately 2/26/2010 2:55 PM, came the following characters from the keyboard of Brett Cannon:
Maybe Greg's and my response to the mention of dropping this feature is too strong -- after all we're both dinosaurs. And maybe the developers who want the feature can write their own loader.
We could also provide if necessary.
So if the implementation stores .pyc by default in a version-specific place, then it seems there are only two things needed to make a python byte-code only distribution... 1) rename all the .pyc to .py 2) packaging When a .pyc is renamed to .py, Python (3.1 at least) recognizes and uses it... I assume by design, rather than accident, but I don't know the history. I didn't experiment to discover what __file__ and __cached__ get set to in this case (especially since I don't have a version with the latter :) ). I speculate that packaging a distribution in this manner would be slightly different that how it is currently done, but I also suspect that it would avoid the same half of the stat calls, to aid performance. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
On 26/02/2010 23:35, Glenn Linderman wrote:
On approximately 2/26/2010 2:55 PM, came the following characters from the keyboard of Brett Cannon:
Maybe Greg's and my response to the mention of dropping this feature is too strong -- after all we're both dinosaurs. And maybe the developers who want the feature can write their own loader.
We could also provide if necessary.
So if the implementation stores .pyc by default in a version-specific place, then it seems there are only two things needed to make a python byte-code only distribution...
1) rename all the .pyc to .py 2) packaging
When a .pyc is renamed to .py, Python (3.1 at least) recognizes and uses it... I assume by design, rather than accident, but I don't know the history.
I didn't experiment to discover what __file__ and __cached__ get set to in this case (especially since I don't have a version with the latter :) ).
I speculate that packaging a distribution in this manner would be slightly different that how it is currently done, but I also suspect that it would avoid the same half of the stat calls, to aid performance.
If this is possible with the new scheme, so long as the Python version and magic number match, then it is slightly kooky but meets the use case. All the best, Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
The one issue I thought would be resolved by not easily allowing .pyc-only distributions is the case when you rename a file (say module.py to newmodule.py) and there is a module.pyc laying around, and you don't get the ImportError you would expect from "import module" -- and to make it worse everything basically works, except there's two versions of the module that slowly become different. This regularly causes problems for me, and those problems would get more common and obscure if the pyc files were stashed away in a more invisible location. I can't even tell what the current proposal is; maybe this is resolved? If distributing bytecode required renaming pyc files to .py as Glenn suggested that would resolve the problem quite nicely from my perspective. (Frankly I find the whole use case for distributing bytecodes a bit specious, but whatever.) -- Ian Bicking | http://blog.ianbicking.org | http://twitter.com/ianbicking
On Fri, Feb 26, 2010 at 16:58, Ian Bicking <ianb@colorstudy.com> wrote:
The one issue I thought would be resolved by not easily allowing .pyc-only distributions is the case when you rename a file (say module.py to newmodule.py) and there is a module.pyc laying around, and you don't get the ImportError you would expect from "import module" -- and to make it worse everything basically works, except there's two versions of the module that slowly become different.
Yes, that problem would go away if bytecode-only modules were no longer supported.
This regularly causes problems for me, and those problems would get more common and obscure if the pyc files were stashed away in a more invisible location.
That has never been an issue with this proposal. The bytecode pulled from the __pycache__ directory only occurs if source exists. What we have been discussing is whether bytecode-only files in the directory of a package or something exists. -Brett
I can't even tell what the current proposal is; maybe this is resolved? If distributing bytecode required renaming pyc files to .py as Glenn suggested that would resolve the problem quite nicely from my perspective. (Frankly I find the whole use case for distributing bytecodes a bit specious, but whatever.)
-- Ian Bicking | http://blog.ianbicking.org | http://twitter.com/ianbicking _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Fri, Feb 26, 2010 at 4:58 PM, Ian Bicking <ianb@colorstudy.com> wrote:
The one issue I thought would be resolved by not easily allowing .pyc-only distributions is the case when you rename a file (say module.py to newmodule.py) and there is a module.pyc laying around, and you don't get the ImportError you would expect from "import module" -- and to make it worse everything basically works, except there's two versions of the module that slowly become different. This regularly causes problems for me, and those problems would get more common and obscure if the pyc files were stashed away in a more invisible location.
I can't even tell what the current proposal is; maybe this is resolved? If distributing bytecode required renaming pyc files to .py as Glenn suggested that would resolve the problem quite nicely from my perspective. (Frankly I find the whole use case for distributing bytecodes a bit specious, but whatever.)
Barry's PEP would fix this even if we kept supporting .pyc-only files: the lingering .pyc files will be in the __pycache__ directory which is *not* searched -- only .pyc files directly in the source directory will be found -- where the PEP will never place them, at least not by default. -- --Guido van Rossum (python.org/~guido)
On Feb 26, 2010, at 05:11 PM, Guido van Rossum wrote:
Barry's PEP would fix this even if we kept supporting .pyc-only files: the lingering .pyc files will be in the __pycache__ directory which is *not* searched -- only .pyc files directly in the source directory will be found -- where the PEP will never place them, at least not by default.
Exactly so. -Barry
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Bicking wrote:
The one issue I thought would be resolved by not easily allowing .pyc-only distributions is the case when you rename a file (say module.py to newmodule.py) and there is a module.pyc laying around, and you don't get the ImportError you would expect from "import module" -- and to make it worse everything basically works, except there's two versions of the module that slowly become different. This regularly causes problems for me, and those problems would get more common and obscure if the pyc files were stashed away in a more invisible location.
I can't even tell what the current proposal is; maybe this is resolved? If distributing bytecode required renaming pyc files to .py as Glenn suggested that would resolve the problem quite nicely from my perspective. (Frankly I find the whole use case for distributing bytecodes a bit specious, but whatever.)
The consensus as I recal was that a .pyc file in the main package directory would be importable without a .py file (just as it is today), but that .pyc files in the cache directory would not be importable in the absence of a .py file. Package distributors who wanted to ship bytecode-only distributions would need to arrange to have the .pyc files created "in place' (by disabling the cachedir option) or move them from the cachedir before bundling. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkuJUFIACgkQ+gerLs4ltQ6pnwCfVmDO8uiP9eSsjJf4ees35xus SEUAn0oKJwv9bGksxcMTHSfBbDV2Ujb7 =Vdpi -----END PGP SIGNATURE-----
On Fri, Feb 26, 2010 at 15:35, Glenn Linderman <v+python@g.nevcal.com<v%2Bpython@g.nevcal.com>
wrote:
On approximately 2/26/2010 2:55 PM, came the following characters from the keyboard of Brett Cannon:
Maybe Greg's and my response to the mention of dropping this feature is too strong -- after all we're both dinosaurs. And maybe the developers who want the feature can write their own loader.
We could also provide if necessary.
So if the implementation stores .pyc by default in a version-specific place, then it seems there are only two things needed to make a python byte-code only distribution...
1) rename all the .pyc to .py 2) packaging
When a .pyc is renamed to .py, Python (3.1 at least) recognizes and uses it... I assume by design, rather than accident, but I don't know the history.
This does not work for me (nor should it):
touch temp.py
python3 -c "import temp"
rm temp.py
mv temp.pyc temp.py
python3 -c "import temp"
Traceback (most recent call last): File "<string>", line 1, in <module> File "temp.py", line 2 SyntaxError: Non-UTF-8 code starting with '\x95' in file temp.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details -Brett
I didn't experiment to discover what __file__ and __cached__ get set to in this case (especially since I don't have a version with the latter :) ).
I speculate that packaging a distribution in this manner would be slightly different that how it is currently done, but I also suspect that it would avoid the same half of the stat calls, to aid performance.
-- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
On approximately 2/26/2010 5:13 PM, came the following characters from the keyboard of Brett Cannon:
On Fri, Feb 26, 2010 at 15:35, Glenn Linderman <v+python@g.nevcal.com <mailto:v%2Bpython@g.nevcal.com>> wrote:
On approximately 2/26/2010 2:55 PM, came the following characters from the keyboard of Brett Cannon:
Maybe Greg's and my response to the mention of dropping this feature is too strong -- after all we're both dinosaurs. And maybe the developers who want the feature can write their own loader.
We could also provide if necessary.
So if the implementation stores .pyc by default in a version-specific place, then it seems there are only two things needed to make a python byte-code only distribution...
1) rename all the .pyc to .py 2) packaging
When a .pyc is renamed to .py, Python (3.1 at least) recognizes and uses it... I assume by design, rather than accident, but I don't know the history.
This does not work for me (nor should it):
touch temp.py python3 -c "import temp" rm temp.py mv temp.pyc temp.py python3 -c "import temp" Traceback (most recent call last): File "<string>", line 1, in <module> File "temp.py", line 2 SyntaxError: Non-UTF-8 code starting with '\x95' in file temp.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
-Brett
I'll admit to not doing exhaustive testing, but I'll not admit to not doing any testing... because it was sort of a wild idea. Someone else called it "kooky", which is fair. What I did was: python -m test ren test.pyc foo.py foo.py and it worked. Then I posted, knowing that I'd also tested, the other day, several .py into a .zip named .py, and once that worked, then I changed to putting all .pyc into the .zip named .py and that worked too... including imports of the several modules from the "__main__.pyc". Of course, all those were still named .pyc inside the .zip named .py. So I'm not sure what the difference is... .pyc as .py works from the command line, but not from import? Some specialty because of using -c ? I'd guess the technique could be made to work, probably not require extensive changes, if Python developers wanted to make it work. I think it could be efficient and that same someone that called it "kooky" admitted it would solve their use case, at least. I'm not sure why what you did is different than what I did, nor why you state without justification that it shouldn't work... I might be able to figure out the former if I spend enough time with the documentation, if it is documented, but I'm too new to Python to understand the latter without explanation. Could you supply at least the latter explanation? I'd like to understand the issue here, whether or not the "kooky" idea goes forward. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
On Fri, Feb 26, 2010 at 20:08, Glenn Linderman <v+python@g.nevcal.com<v%2Bpython@g.nevcal.com>
wrote:
On approximately 2/26/2010 5:13 PM, came the following characters from the keyboard of Brett Cannon:
On Fri, Feb 26, 2010 at 15:35, Glenn Linderman <v+python@g.nevcal.com<v%2Bpython@g.nevcal.com><mailto:
v%2Bpython@g.nevcal.com <v%252Bpython@g.nevcal.com>>> wrote:
On approximately 2/26/2010 2:55 PM, came the following characters from the keyboard of Brett Cannon:
Maybe Greg's and my response to the mention of dropping this feature is too strong -- after all we're both dinosaurs. And maybe the developers who want the feature can write their own loader.
We could also provide if necessary.
So if the implementation stores .pyc by default in a version-specific place, then it seems there are only two things needed to make a python byte-code only distribution...
1) rename all the .pyc to .py 2) packaging
When a .pyc is renamed to .py, Python (3.1 at least) recognizes and uses it... I assume by design, rather than accident, but I don't know the history.
This does not work for me (nor should it):
touch temp.py python3 -c "import temp" rm temp.py mv temp.pyc temp.py python3 -c "import temp" Traceback (most recent call last): File "<string>", line 1, in <module> File "temp.py", line 2 SyntaxError: Non-UTF-8 code starting with '\x95' in file temp.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
-Brett
I'll admit to not doing exhaustive testing, but I'll not admit to not doing any testing... because it was sort of a wild idea. Someone else called it "kooky", which is fair.
What I did was:
python -m test ren test.pyc foo.py foo.py
and it worked. Then I posted, knowing that I'd also tested, the other day, several .py into a .zip named .py, and once that worked, then I changed to putting all .pyc into the .zip named .py and that worked too... including imports of the several modules from the "__main__.pyc". Of course, all those were still named .pyc inside the .zip named .py.
So I'm not sure what the difference is... .pyc as .py works from the command line, but not from import? Some specialty because of using -c ?
I'd guess the technique could be made to work, probably not require extensive changes, if Python developers wanted to make it work. I think it could be efficient and that same someone that called it "kooky" admitted it would solve their use case, at least.
I'm not sure why what you did is different than what I did,
-M uses runpy which is not directly equivalent to importing.
nor why you state without justification that it shouldn't work...
It just is not supposed to happen that way. Masquerading a bytecode file as a source file shouldn't work; imp.get_suffixes() controls how files should be interpreted based on their file extension. -Brett
I might be able to figure out the former if I spend enough time with the documentation, if it is documented, but I'm too new to Python to understand the latter without explanation. Could you supply at least the latter explanation? I'd like to understand the issue here, whether or not the "kooky" idea goes forward.
-- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
Brett Cannon wrote:
On Fri, Feb 26, 2010 at 20:08, Glenn Linderman <v+python@g.nevcal.com I'm not sure why what you did is different than what I did,
-M uses runpy which is not directly equivalent to importing.
It's actually execution which is different from importing. Direct execution doesn't care about filenames (it inspects the file itself to figure out what it is), while importing cares a great deal. Note that Glenn ran "foo.py" directly, while Brett did "import temp". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Glenn Linderman wrote:
What I did was:
python -m test ren test.pyc foo.py foo.py
and it worked.
Source files mentioned on the command line aren't required to have a .py extension. I think what's happening is that the interpreter ignores the filename altogether in that case and examines the contents of the file to figure out what it is, in order to support running .pyc files from the command line. -- Greg
On approximately 2/27/2010 5:25 PM, came the following characters from the keyboard of Greg Ewing:
Glenn Linderman wrote:
What I did was:
python -m test ren test.pyc foo.py foo.py
and it worked.
Source files mentioned on the command line aren't required to have a .py extension. I think what's happening is that the interpreter ignores the filename altogether in that case and examines the contents of the file to figure out what it is, in order to support running .pyc files from the command line.
Thanks for the explanation. Brett mentioned something like runpy vs import using different techniques. Which is OK, I guess, but if the command line/runpy can do it, the importer could do it. Just a matter of desire and coding. Whether it is worth pursuing further depends on people's perceptions of "kookiness" vs. functional and performance considerations. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
Glenn Linderman wrote:
Thanks for the explanation. Brett mentioned something like runpy vs import using different techniques. Which is OK, I guess, but if the command line/runpy can do it, the importer could do it. Just a matter of desire and coding. Whether it is worth pursuing further depends on people's perceptions of "kookiness" vs. functional and performance considerations.
As I said previously, don't underestimate how different __main__ is from everything else. The most obvious difference is that the code for __main__ is executed without holding the import lock, but there are other important differences as well (such as the module object being created directly by the interpreter startup sequence and hence a lot of the import machinery being bypassed). Even the -m switch doesn't really follow the normal import paths (it just finds the code object for the named module and then runs it directly instead of importing it). Direct execution starts with a filename (or a module name when using -m) then works out how to execute it as __main__. Importing starts with a module name, tries to find a matching filename and create the corresponding module. The different starting points and the different end goals affect the assumptions that are made while the interpreter figures out what it needs to do. The behaviour of runpy is different from import precisely because it aims to mimic execution of __main__ rather than a normal import. If there weren't quite so many semantic differences between direct execution and normal import, the module would have been a lot easier to write :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Glenn Linderman wrote:
if the command line/runpy can do it, the importer could do it. Just a matter of desire and coding. Whether it is worth pursuing further depends on people's perceptions of "kookiness" vs. functional and performance considerations.
Having .py files around that aren't source text could lead to a lot of confusion, given that most platforms these days decide which application to open for a given file based solely on the filename extension. I wouldn't enjoy trying to open a .py file only to have my text editor blow up because it was actually a binary file. So on balance I think it's a bit too kooky for my taste. -- Greg
On approximately 2/28/2010 3:22 PM, came the following characters from the keyboard of Greg Ewing:
Glenn Linderman wrote:
if the command line/runpy can do it, the importer could do it. Just a matter of desire and coding. Whether it is worth pursuing further depends on people's perceptions of "kookiness" vs. functional and performance considerations.
Having .py files around that aren't source text could lead to a lot of confusion, given that most platforms these days decide which application to open for a given file based solely on the filename extension. I wouldn't enjoy trying to open a .py file only to have my text editor blow up because it was actually a binary file.
So on balance I think it's a bit too kooky for my taste.
I understand your thoughts, but have some rebuttal comments. Mind you, if there is a better solution that can improve performance for both the source+binary and the binary-only distributions, I'm all for it. But in general, I'm all for performance improvements, even if there is some kookiness :) Thankful for Brett's posting of the actual search code fragment. If your text editor blows up because it is binary, it is a sad text editor. If you have .py mapped to a text editor, that's sort of kooky too; I have it mapped to Python. The .py files that are binary would generally be part of an application distribution in binary form, and therefore would be installed in some place like /bin or C:\Program Files ... not the place you'd look for source code, to confuse your text editor. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
Le Sun, 28 Feb 2010 19:32:09 -0800, Glenn Linderman <v+python@g.nevcal.com> a écrit :
If your text editor blows up because it is binary, it is a sad text editor.
If you have .py mapped to a text editor, that's sort of kooky too; I have it mapped to Python.
File extensions exist for a reason, even if you find that "kooky" and have strong ideas about the psychology of text editors. Having some binary files named "foobar.py" would certainly annoy a lot of people, including me. Antoine.
On Feb 28, 2010, at 10:40 PM, Antoine Pitrou wrote:
File extensions exist for a reason, even if you find that "kooky" and have strong ideas about the psychology of text editors.
Having some binary files named "foobar.py" would certainly annoy a lot of people, including me.
I completely agree. -Barry
Glenn Linderman wrote:
If your text editor blows up because it is binary, it is a sad text editor.
Blow up is probably an exaggeration, but even just getting a screen full of gibberish when I think I'm opening a text file is a jarring experience.
If you have .py mapped to a text editor, that's sort of kooky too; I have it mapped to Python.
On Windows the action for double-clicking is usually mapped to running the file, but there's typically another action such as "Open with IDLE" or whatever available, and a bytecode file named with ".py" would allow you to apply that action to it. -- Greg
On Fri, Feb 26, 2010 at 5:13 PM, Brett Cannon <brett@python.org> wrote:
On Fri, Feb 26, 2010 at 15:35, Glenn Linderman <v+python@g.nevcal.com>
When a .pyc is renamed to .py, Python (3.1 at least) recognizes and uses it... I assume by design, rather than accident, but I don't know the history.
This does not work for me (nor should it):
touch temp.py
python3 -c "import temp"
rm temp.py
mv temp.pyc temp.py
python3 -c "import temp"
Traceback (most recent call last): File "<string>", line 1, in <module> File "temp.py", line 2 SyntaxError: Non-UTF-8 code starting with '\x95' in file temp.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Try "python temp.py" though. -- --Guido van Rossum (python.org/~guido)
On approximately 2/26/2010 8:31 PM, came the following characters from the keyboard of Brett Cannon:
I'm not sure why what you did is different than what I did,
-M uses runpy which is not directly equivalent to importing.
OK, that gives me some good keywords for searching documentation. What I (thought I) knew so far, was that it seemed to be equivalent, but that could easily be the 10,000' view instead of the reality. Thanks.
nor why you state without justification that it shouldn't work...
It just is not supposed to happen that way. Masquerading a bytecode file as a source file shouldn't work; imp.get_suffixes() controls how files should be interpreted based on their file extension.
Well, since a .py can be a .zip, why not a .pyc? Just because no one thought of doing it before? Of course, I realize that I only know that a .py can be a .zip on the command line (is that runpy again, I'll bet it is), not for importing, which probably doesn't work, from what you imply. But if the technique can work from the command line, it seems the same technique could be re-used in the importer. A bytecode only .py would result in identical values for __file__ and __cached__ methinks. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
Glenn Linderman wrote:
But if the technique can work from the command line, it seems the same technique could be re-used in the importer.
Not really - we only get away with the fun and games at execution time because __main__ is a bit special (and always has been). Those tricks would be a lot harder to pull off for a normal module import (if they were possible at all - I'm not sure they would be). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Feb 26, 2010, at 02:55 PM, Brett Cannon wrote:
Here is a question for Barry to think about if he decides to move forward with all of this: would mixed support for both bytecode-only and source/bytecode be required for the same directory, or could it be one or the other but not both? Differing semantics based on what is found in the directory would make the path hook more expensive (which is a one-time cost per directory), but it would cut stat calls in the finder in half (which is a cost made per import).
It seems a bit magical to me, and the rules a bit difficult to predict. For example, what would be the trigger to enable bytecode-only support for a package directory? Would it be the absence of an __init__.py file? What if some .pyc files had .py file but not all of them? Wouldn't the trigger depend on import order? OTOH, maybe you're on to something. Perhaps we could add a flag to the package's namespace to turn this on. You'd have to include the __init__.py to get things going, but after that, everything else in the package could be .pyc-only. -Barry
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Barry Warsaw wrote:
On Feb 26, 2010, at 02:55 PM, Brett Cannon wrote:
Here is a question for Barry to think about if he decides to move forward with all of this: would mixed support for both bytecode-only and source/bytecode be required for the same directory, or could it be one or the other but not both? Differing semantics based on what is found in the directory would make the path hook more expensive (which is a one-time cost per directory), but it would cut stat calls in the finder in half (which is a cost made per import).
It seems a bit magical to me, and the rules a bit difficult to predict. For example, what would be the trigger to enable bytecode-only support for a package directory? Would it be the absence of an __init__.py file? What if some .pyc files had .py file but not all of them? Wouldn't the trigger depend on import order?
OTOH, maybe you're on to something. Perhaps we could add a flag to the package's namespace to turn this on. You'd have to include the __init__.py to get things going, but after that, everything else in the package could be .pyc-only.
Why not just leave the code for import in the package directory as it is today, where .pyc files are already importable in the absence of a .py file? As long as file in the cachedir are *not* importable without the source, both sides win, AFAICT: most people will no longer have .pyc's in their package direoctories, and those who want them can get them, thorugh some means (moving from the cachedir, or disabling the cachedir feature). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkuJUXkACgkQ+gerLs4ltQ76UACeMtgUz+mxmxlU1wLgl58R4ZA0 aVMAoKEmVG0D8a37Ftag6srPQSWfptON =49Tz -----END PGP SIGNATURE-----
On Feb 26, 2010, at 02:29 PM, Guido van Rossum wrote:
Byte-code only wasn't always supported. We added it knowing full well it had all those problems (plus, it locks in the Python version), simply because a certain class of developers won't stop asking for it. Their users are apparently too dumb to decode bytecode but smart enough to read source code, even if they don't understand it, and this knowledge could hurt them. Presumably users smart enough to decode bytecode will know enough not to hurt themselves.
For now, I've added a open issues section to the PEP describing the options for bytecode-only support. I think there are better ways to satisfy the bytecode-only packager requirements than supporting it by default, always, in the standard importer, but let's enumerate the pros and cons and then make a decision. -Barry
Guido van Rossum wrote:
Their users are apparently too dumb to decode bytecode but smart enough to read source code, even if they don't understand it, and this knowledge could hurt them.
I think it's like putting a lock on your door. It won't stop anyone who's determined to get in, but it makes it hard for them to argue in court that they wandered in accidentally. Also it may make it easier to get the idea of using Python past PHBs. That seems to me like a good reason for keeping the feature. -- Greg
Le Fri, 26 Feb 2010 14:29:03 -0800, Guido van Rossum a écrit :
Byte-code only wasn't always supported. We added it knowing full well it had all those problems (plus, it locks in the Python version), simply because a certain class of developers won't stop asking for it. Their users are apparently too dumb to decode bytecode but smart enough to read source code, even if they don't understand it, and this knowledge could hurt them.
The idea that too much knowledge hurts users doesn't sound very Pythonic to me. As I understand it, the people interested in bytecode-only distributions are commercial companies willing to ease support. Why don't they whip up a specialized importer, and perhaps make it available as a recipe or a PyPI module somewhere? The idea that we should provide built-in support for a stupid (non-)security mechanism sounds insane to me. Finally, the sight of commercial companies not being able to do their work and begging open source projects to do it for them makes me *yawn*. If you aren't proficient or motivated enough to build your own internal commodities, perhaps you shouldn't do claim to do business at all. regards Antoine.
On 28/02/2010 01:22, Antoine Pitrou wrote:
Le Fri, 26 Feb 2010 14:29:03 -0800, Guido van Rossum a écrit :
Byte-code only wasn't always supported. We added it knowing full well it had all those problems (plus, it locks in the Python version), simply because a certain class of developers won't stop asking for it. Their users are apparently too dumb to decode bytecode but smart enough to read source code, even if they don't understand it, and this knowledge could hurt them.
The idea that too much knowledge hurts users doesn't sound very Pythonic to me.
As I understand it, the people interested in bytecode-only distributions are commercial companies willing to ease support. Why don't they whip up a specialized importer, and perhaps make it available as a recipe or a PyPI module somewhere? The idea that we should provide built-in support for a stupid (non-)security mechanism sounds insane to me.
Finally, the sight of commercial companies not being able to do their work and begging open source projects to do it for them makes me *yawn*. If you aren't proficient or motivated enough to build your own internal commodities, perhaps you shouldn't do claim to do business at all.
Well if we'd *never* had this feature this argument would be very strong indeed. On the other hand if we want them to switch to Python 3 - but by the way we cut one of the features you rely on, but don't worry all you have to do is recode it yourself - doesn't make a very convincing argument. Michael Foord
regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
Le Sun, 28 Feb 2010 01:25:38 +0000, Michael Foord <fuzzyman@voidspace.org.uk> a écrit :
Well if we'd *never* had this feature this argument would be very strong indeed. On the other hand if we want them to switch to Python 3 - but by the way we cut one of the features you rely on, but don't worry all you have to do is recode it yourself - doesn't make a very convincing argument.
I understand it. On the other hand, it is certainly one of the least important issues involved in porting to py3k. (even for those people who liked the feature) And I think the prospect of a slight simplification (or de-complexification) of the import machinery is an important selling point. Regards Antoine.
On Feb 26, 2010, at 02:09 PM, Brett Cannon wrote:
But a benefit of no longer supporting bytecode-only modules by default is it cuts back on possible stat calls which slows down Python's startup time (a complaint I hear a lot). Performance issues become even more acute if you try to come up with even a remotely proper way to have backwards-compatible support in importlib for its ABCs w/o forcing caching on all implementors of the ABCs.
And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know.
Brett really hits the nail on the head, and yes I'm sorry for not being clear about what "we discussed this at Pycon" meant. The "we" being Brett and I of course (and Chris Withers IIRC). Bytecode-only deployments are a bit of a sham, and definitely a minority use case, so why should all of Python pay for the extra stat calls to support this by default? How many people would actually be hurt if this wasn't available out of the box, especially since you can still support it if you really want it and can't convince your manager that it provides essentially zero useful obfuscation of your code? I say this having been down that road myself with a previous employer. Management was pretty adamant about wanting this until I explained how easy it was to defeat and convinced them that the engineering resources to do it were better spent elsewhere. Having said that, I'd be all for including a reference implementation of a bytecode-only loader in the PEP for demonstration purposes. Greg, would you like to contribute that? -Barry
Barry Warsaw wrote:
On Feb 26, 2010, at 02:09 PM, Brett Cannon wrote:
But a benefit of no longer supporting bytecode-only modules by default is it cuts back on possible stat calls which slows down Python's startup time (a complaint I hear a lot). Performance issues become even more acute if you try to come up with even a remotely proper way to have backwards-compatible support in importlib for its ABCs w/o forcing caching on all implementors of the ABCs.
And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know.
Brett really hits the nail on the head, and yes I'm sorry for not being clear about what "we discussed this at Pycon" meant. The "we" being Brett and I of course (and Chris Withers IIRC).
Bytecode-only deployments are a bit of a sham, and definitely a minority use case, so why should all of Python pay for the extra stat calls to support this by default? How many people would actually be hurt if this wasn't available out of the box, especially since you can still support it if you really want it and can't convince your manager that it provides essentially zero useful obfuscation of your code?
I say this having been down that road myself with a previous employer. Management was pretty adamant about wanting this until I explained how easy it was to defeat and convinced them that the engineering resources to do it were better spent elsewhere.
Having said that, I'd be all for including a reference implementation of a bytecode-only loader in the PEP for demonstration purposes. Greg, would you like to contribute that?
-Barry
Micheal Foords view point on this strikes me as the most realistic. Some people do find it to be a value for their particular needs and circumstance. Michael Foord Wrote:
For many use-cases some protection is enough. After all *any* DRM or source-code obfuscation is breakable in the medium / long term - so just enough to discourage the casual looker is probably sufficient. The fact that bytecode only distributions exist speaks to that.
Whether you believe that allowing companies who ship bytecode is a disservice to them or not is fundamentally irrelevant. If they believe it is a service to them then it is... :-)
To possibly qualify it a bit more: It does not make sense (to me) to have byte code only modules and packages in python's lib directory. The whole purpose (as far as I know) is for modules and packages located there to be shared. And as such, the source file becomes a source of documentation. Not supporting bytecode only python modules and packages in pythons "lib" directory may be good. For python programs located and installed elsewhere I think Michaels view point is applicable. For some files that are not meant to be shared, some form of discouragement can be a feature. Ron Adam
On Feb 26, 2010, at 08:30 PM, Ron Adam wrote:
It does not make sense (to me) to have byte code only modules and packages in python's lib directory. The whole purpose (as far as I know) is for modules and packages located there to be shared. And as such, the source file becomes a source of documentation. Not supporting bytecode only python modules and packages in pythons "lib" directory may be good.
Actually, it's not the standard library that's the issue, it's third party modules that OS vendors distribute. -Barry
On 26/02/2010 22:09, Brett Cannon wrote:
On Thu, Feb 25, 2010 at 16:13, Greg Ewing <greg.ewing@canterbury.ac.nz <mailto:greg.ewing@canterbury.ac.nz>> wrote:
Michael Foord wrote:
I thought we agreed at the language summit that if a .pyc was in the place of the source file it *could* be imported from - making pyc only distributions possible.
Ah, that's okay, then. Sorry about the panic!
Michael is right about what as discussed at the language summit, but Barry means what he says; if you look at the PEP as it currently stands it does not support bytecode-only modules.
Barry and I discussed how to implement the PEP at PyCon after the summit and supporting bytecode-only modules quickly began to muck with the semantics and made it harder to explain (i.e. what to set __file__ vs. __compiled__ based on what is or is not available and how to properly define get_paths for loaders). But a benefit of no longer supporting bytecode-only modules by default is it cuts back on possible stat calls which slows down Python's startup time (a complaint I hear a lot). Performance issues become even more acute if you try to come up with even a remotely proper way to have backwards-compatible support in importlib for its ABCs w/o forcing caching on all implementors of the ABCs.
As for having a dependency on a loader, I don't see how that is obscure; it's just a dependency your package has that you handle at install-time.
And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know. Bytecode contains so much data that disassembling it gives you a very clear picture of what the original code was like.
Well, understanding bytecode is *still* requires a higher level of understanding than the *majority* of Python programmers. Added to which there are no widely available tools that *I'm* aware of for decompiling recent versions of Python (decompyle worked up to Python 2.4 but then went closed source as a commercial service [1]. The situation is analagous to .NET assemblies by the way (which *can* be trivially decompiled by several widely available tools). Having a non-source distribution prevents your users from changing things and then calling you for support without them having to go to a lot more effort than it is worth. There are several companies who currently ship bytecode only. (There was someone on the IronPython mailing list only last week asking if IronPython could support pyc files for this reason). For many pointy-haired-bosses 'some' protection is enough and having Python not support this (out of the box) would be a black mark against Python for them.
I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else.
For many use-cases some protection is enough. After all *any* DRM or source-code obfuscation is breakable in the medium / long term - so just enough to discourage the casual looker is probably sufficient. The fact that bytecode only distributions exist speaks to that. Whether you believe that allowing companies who ship bytecode is a disservice to them or not is fundamentally irrelevant. If they believe it is a service to them then it is... :-) As you can tell, I would be disappointed to see bytecode only distributions be removed from the out-of-the-box functionality. All the best, Michael
The only perk I can see is space-saving, but that's dangerous as that ties you to a specific VM with a specific magic number (let alone that it leads to people tying themselves to CPython and ignoring the other VMs that simply do not support bytecode).
[1] http://www.crazy-compilers.com/decompyle/
-Brett
-- Greg
_______________________________________________ Python-Dev mailing list Python-Dev@python.org <mailto:Python-Dev@python.org> http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Feb 26, 2010, at 5:59 PM, Michael Foord wrote:
On 26/02/2010 22:09, Brett Cannon wrote:
On Thu, Feb 25, 2010 at 16:13, Greg Ewing <greg.ewing@canterbury.ac.nz
wrote: Michael Foord wrote:
I thought we agreed at the language summit that if a .pyc was in the place of the source file it *could* be imported from - making pyc only distributions possible.
Ah, that's okay, then. Sorry about the panic!
Michael is right about what as discussed at the language summit, but Barry means what he says; if you look at the PEP as it currently stands it does not support bytecode-only modules.
Barry and I discussed how to implement the PEP at PyCon after the summit and supporting bytecode-only modules quickly began to muck with the semantics and made it harder to explain (i.e. what to set __file__ vs. __compiled__ based on what is or is not available and how to properly define get_paths for loaders). But a benefit of no longer supporting bytecode-only modules by default is it cuts back on possible stat calls which slows down Python's startup time (a complaint I hear a lot). Performance issues become even more acute if you try to come up with even a remotely proper way to have backwards-compatible support in importlib for its ABCs w/o forcing caching on all implementors of the ABCs.
As for having a dependency on a loader, I don't see how that is obscure; it's just a dependency your package has that you handle at install-time.
And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know. Bytecode contains so much data that disassembling it gives you a very clear picture of what the original code was like.
Well, understanding bytecode is *still* requires a higher level of understanding than the *majority* of Python programmers. Added to which there are no widely available tools that *I'm* aware of for decompiling recent versions of Python (decompyle worked up to Python 2.4 but then went closed source as a commercial service [1].
The situation is analagous to .NET assemblies by the way (which *can* be trivially decompiled by several widely available tools). Having a non-source distribution prevents your users from changing things and then calling you for support without them having to go to a lot more effort than it is worth.
There are several companies who currently ship bytecode only. (There was someone on the IronPython mailing list only last week asking if IronPython could support pyc files for this reason). For many pointy- haired-bosses 'some' protection is enough and having Python not support this (out of the box) would be a black mark against Python for them.
We ship bytecode only, basically for the reason Michael states here (keeping support costs under control from "ambitious" users).
I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else.
For many use-cases some protection is enough. After all *any* DRM or source-code obfuscation is breakable in the medium / long term - so just enough to discourage the casual looker is probably sufficient. The fact that bytecode only distributions exist speaks to that.
Right. We're more concerned with not having users muck with stuff than with keeping the implementation a secret, although having a bit of obfuscation doesn't hurt.
Whether you believe that allowing companies who ship bytecode is a disservice to them or not is fundamentally irrelevant. If they believe it is a service to them then it is... :-)
As you can tell, I would be disappointed to see bytecode only distributions be removed from the out-of-the-box functionality.
+1 Doug
On Fri, Feb 26, 2010 at 17:20, Doug Hellmann <doug.hellmann@gmail.com>wrote:
On Feb 26, 2010, at 5:59 PM, Michael Foord wrote:
On 26/02/2010 22:09, Brett Cannon wrote:
On Thu, Feb 25, 2010 at 16:13, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
Michael Foord wrote:
I thought we agreed at the language summit that if a .pyc was in the
place of the source file it *could* be imported from - making pyc only distributions possible.
Ah, that's okay, then. Sorry about the panic!
Michael is right about what as discussed at the language summit, but Barry means what he says; if you look at the PEP as it currently stands it does not support bytecode-only modules.
Barry and I discussed how to implement the PEP at PyCon after the summit and supporting bytecode-only modules quickly began to muck with the semantics and made it harder to explain (i.e. what to set __file__ vs. __compiled__ based on what is or is not available and how to properly define get_paths for loaders). But a benefit of no longer supporting bytecode-only modules by default is it cuts back on possible stat calls which slows down Python's startup time (a complaint I hear a lot). Performance issues become even more acute if you try to come up with even a remotely proper way to have backwards-compatible support in importlib for its ABCs w/o forcing caching on all implementors of the ABCs.
As for having a dependency on a loader, I don't see how that is obscure; it's just a dependency your package has that you handle at install-time.
And personally, I don't see what bytecode-only modules buy you. The obfuscation argument is bunk as we all know. Bytecode contains so much data that disassembling it gives you a very clear picture of what the original code was like.
Well, understanding bytecode is *still* requires a higher level of understanding than the *majority* of Python programmers. Added to which there are no widely available tools that *I'm* aware of for decompiling recent versions of Python (decompyle worked up to Python 2.4 but then went closed source as a commercial service [1].
The situation is analagous to .NET assemblies by the way (which *can* be trivially decompiled by several widely available tools). Having a non-source distribution prevents your users from changing things and then calling you for support without them having to go to a lot more effort than it is worth.
There are several companies who currently ship bytecode only. (There was someone on the IronPython mailing list only last week asking if IronPython could support pyc files for this reason). For many pointy-haired-bosses 'some' protection is enough and having Python not support this (out of the box) would be a black mark against Python for them.
We ship bytecode only, basically for the reason Michael states here (keeping support costs under control from "ambitious" users).
I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else.
For many use-cases some protection is enough. After all *any* DRM or source-code obfuscation is breakable in the medium / long term - so just enough to discourage the casual looker is probably sufficient. The fact that bytecode only distributions exist speaks to that.
Right. We're more concerned with not having users muck with stuff than with keeping the implementation a secret, although having a bit of obfuscation doesn't hurt.
Whether you believe that allowing companies who ship bytecode is a disservice to them or not is fundamentally irrelevant. If they believe it is a service to them then it is... :-)
As you can tell, I would be disappointed to see bytecode only distributions be removed from the out-of-the-box functionality.
+1
So what is the burden of including a single source file that added the support to load from bytecode-only modules? I am not saying you shouldn't be able to have this functionality, just that I personally don't want to pay for the overhead (both performance-wise and development-wise) by default just because you and some other people want this functionality for some clients. -Brett
On Feb 26, 2010, at 8:30 PM, Brett Cannon wrote:
So what is the burden of including a single source file that added the support to load from bytecode-only modules? I am not saying you shouldn't be able to have this functionality, just that I personally don't want to pay for the overhead (both performance-wise and development-wise) by default just because you and some other people want this functionality for some clients.
If such a module was available, we'd use it if that was the way to achieve what we want. We could write something like that on our own, but we'd be more likely to decide to just stick with Python 2 for longer because we're going to prioritize new features over doing "hidden" maintenance work like that. So, we want the ability to ship bytecode-only versions of the software, but the specific mechanism for doing so doesn't matter a lot. Doug
On Feb 26, 2010, at 10:59 PM, Michael Foord wrote:
There are several companies who currently ship bytecode only. (There was someone on the IronPython mailing list only last week asking if IronPython could support pyc files for this reason). For many pointy-haired-bosses 'some' protection is enough and having Python not support this (out of the box) would be a black mark against Python for them.
Would it not be better to ship a zip file with an obfuscated name? Doesn't that satisfy the use case nicely? -Barry
On Sat, Feb 27, 2010 at 10:56:13AM -0500, Barry Warsaw wrote:
On Feb 26, 2010, at 10:59 PM, Michael Foord wrote:
There are several companies who currently ship bytecode only. (There was someone on the IronPython mailing list only last week asking if IronPython could support pyc files for this reason). For many pointy-haired-bosses 'some' protection is enough and having Python not support this (out of the box) would be a black mark against Python for them.
Would it not be better to ship a zip file with an obfuscated name? Doesn't that satisfy the use case nicely?
Sure, we combine that with putting .pyo files inside the zipfile tough (for assert statements and if __debug__ blocks). I'm rather confused about everything proposed by now but would that keep working? Also somewhere else in the thread it seemed like both you and Guido suggested that simply creating a directory with some .pyc (or .pyo I guess) files in would keep working, just by default they won't be written there by python. Or is it that functionality some want to cut because of the doubling of the stat calls? (But even then I'm not convinced that would double the stat calls for normal users, only for those who only ship .pyc files) Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
Floris Bruynooghe wrote:
(But even then I'm not convinced that would double the stat calls for normal users, only for those who only ship .pyc files)
It would increase the number of stat calls for normal users by 50%. You would need to look for a .pyc in the source directory, then .py in the source directory and .pyc in the cache directory. That's compared to two stat calls currently, for .py and .pyc. A solution might be to look for the presence of the cache directory, and only look for a .pyc in the source directory if there is no cache directory. Testing for the cache directory would only have to be done once per package and the result remembered, so it would add very little overhead. -- Greg
On Sun, Feb 28, 2010 at 02:51:16PM +1300, Greg Ewing wrote:
Floris Bruynooghe wrote:
(But even then I'm not convinced that would double the stat calls for normal users, only for those who only ship .pyc files)
It would increase the number of stat calls for normal users by 50%. You would need to look for a .pyc in the source directory, then .py in the source directory and .pyc in the cache directory. That's compared to two stat calls currently, for .py and .pyc.
Can't it look for a .py file in the source directory first (1st stat)? When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check). Or am I missing a subtlety?
A solution might be to look for the presence of the cache directory, and only look for a .pyc in the source directory if there is no cache directory. Testing for the cache directory would only have to be done once per package and the result remembered, so it would add very little overhead.
That would work too, but I don't understand yet why the .pyc check in the source directory can't be done last. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
-- http://www.ironpythoninaction.com On 28 Feb 2010, at 12:19, Floris Bruynooghe <floris.bruynooghe@gmail.com> wrote:
On Sun, Feb 28, 2010 at 02:51:16PM +1300, Greg Ewing wrote:
Floris Bruynooghe wrote:
(But even then I'm not convinced that would double the stat calls for normal users, only for those who only ship .pyc files)
It would increase the number of stat calls for normal users by 50%. You would need to look for a .pyc in the source directory, then .py in the source directory and .pyc in the cache directory. That's compared to two stat calls currently, for .py and .pyc.
Can't it look for a .py file in the source directory first (1st stat)? When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check). Or am I missing a subtlety?
The problem is doing this little dance for every path on sys.path. Michael
A solution might be to look for the presence of the cache directory, and only look for a .pyc in the source directory if there is no cache directory. Testing for the cache directory would only have to be done once per package and the result remembered, so it would add very little overhead.
That would work too, but I don't understand yet why the .pyc check in the source directory can't be done last.
Regards Floris
-- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
Michael Foord wrote:
Can't it look for a .py file in the source directory first (1st stat)? When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check). Or am I missing a subtlety?
The problem is doing this little dance for every path on sys.path.
To unpack this a little bit for those not quite as familiar with the import system (and to make it clear for my own benefit!): for a top-level module/package, each path on sys.path needs to be eliminated as a possible location before the interpreter can move on to check the next path in the list. So the important number is the number of stat calls on a "miss" (i.e. when the requested module/package is not present in a directory). Currently, with builtin support for bytecode only files, there are 3 checks (package directory, py source file, pyc/pyo bytecode file) to be made for each path entry. The PEP proposes to reduce that to only two in the case of a miss, by checking for the cached pyc only if the source file is present (there would still be three checks for a "hit", but that only happens at most once per module lookup). While the PEP is right in saying that a bytecode-only import hook could be added, I believe it would actually be a little tricky to write one that didn't severely degrade the performance of either normal imports or bytecode-only imports. Keeping it in the core import, but turning it off by default seems much less likely to have unintended performance consequences when it is switched back on. Another option is to remove bytecode-only support from the default filesystem importer, but keep it for zipimport (since the stat call savings don't apply in the latter case). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sun, Feb 28, 2010 at 11:07:27PM +1000, Nick Coghlan wrote:
Michael Foord wrote:
Can't it look for a .py file in the source directory first (1st stat)? When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check). Or am I missing a subtlety?
The problem is doing this little dance for every path on sys.path.
To unpack this a little bit for those not quite as familiar with the import system (and to make it clear for my own benefit!): for a top-level module/package, each path on sys.path needs to be eliminated as a possible location before the interpreter can move on to check the next path in the list.
Aha, that was the clue I was missing. Thanks! Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
On Sun, Feb 28, 2010 at 05:07, Nick Coghlan <ncoghlan@gmail.com> wrote:
Michael Foord wrote:
Can't it look for a .py file in the source directory first (1st stat)? When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check). Or am I missing a subtlety?
The problem is doing this little dance for every path on sys.path.
To unpack this a little bit for those not quite as familiar with the import system (and to make it clear for my own benefit!): for a top-level module/package, each path on sys.path needs to be eliminated as a possible location before the interpreter can move on to check the next path in the list.
So the important number is the number of stat calls on a "miss" (i.e. when the requested module/package is not present in a directory). Currently, with builtin support for bytecode only files, there are 3 checks (package directory, py source file, pyc/pyo bytecode file) to be made for each path entry.
Actually it's four: name/__init__.py, name/__init__.pyc, name.py, and then name.pyc. And just so people have terminology to go with all of this, this search is what the finder does to say whether it can or cannot handle the requested module.
The PEP proposes to reduce that to only two in the case of a miss, by checking for the cached pyc only if the source file is present (there would still be three checks for a "hit", but that only happens at most once per module lookup).
Just to be explicit, Nick is talking about name/__init__.py and name.py (note the skipping of looking for any .pyc files). At that point only the loader needs to check for the bytecode in the __pycache__ directory.
While the PEP is right in saying that a bytecode-only import hook could be added, I believe it would actually be a little tricky to write one that didn't severely degrade the performance of either normal imports or bytecode-only imports. Keeping it in the core import, but turning it off by default seems much less likely to have unintended performance consequences when it is switched back on.
It all depends on how it is implemented. If the bytecode-only importer stats a directory to check for the existence of any source in order to decide not to handle it, that is an extra stat call, but that is only once per sys.path/__path__ location by the path hook, not every attempted import. Now if I ever manage to find the time to break up the default importers and expose them then it should be no more then adding the bytecode-only importer to the chained finder that already exists (it essentially chains source and extension modules).
Another option is to remove bytecode-only support from the default filesystem importer, but keep it for zipimport (since the stat call savings don't apply in the latter case).
That's a very nice option. That would isolate it into a single importer that doesn't impact general performance for everyone else. -Brett
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Sun, 2010-02-28 at 12:21 -0800, Brett Cannon wrote:
Actually it's four: name/__init__.py, name/__init__.pyc, name.py, and then name.pyc. And just so people have terminology to go with all of this, this search is what the finder does to say whether it can or cannot handle the requested module.
Aren't there also: name.so namemodule.so ? -Rob
Brett Cannon wrote:
Actually it's four: name/__init__.py, name/__init__.pyc, name.py, and then name.pyc. And just so people have terminology to go with all of this, this search is what the finder does to say whether it can or cannot handle the requested module.
Huh, I thought we checked for the directory first and only then checked for the __init__ module within it (hence the generation of ImportWarning when we don't find __init__ after finding a correctly named directory). So a normal miss (i.e. no directory) only needs one stat call. (However, I'll grant that I haven't looked at this particular chunk of code in a fairly long time, so I could easily be wrong). Robert raises a good point about the checks for extension modules as well - we should get an accurate count here so Barry's PEP can pitch the proportional reduction in stat calls accurately. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sun, Feb 28, 2010 at 12:46, Nick Coghlan <ncoghlan@gmail.com> wrote:
Brett Cannon wrote:
Actually it's four: name/__init__.py, name/__init__.pyc, name.py, and then name.pyc. And just so people have terminology to go with all of this, this search is what the finder does to say whether it can or cannot handle the requested module.
Huh, I thought we checked for the directory first and only then checked for the __init__ module within it (hence the generation of ImportWarning when we don't find __init__ after finding a correctly named directory). So a normal miss (i.e. no directory) only needs one stat call.
(However, I'll grant that I haven't looked at this particular chunk of code in a fairly long time, so I could easily be wrong).
Robert raises a good point about the checks for extension modules as well - we should get an accurate count here so Barry's PEP can pitch the proportional reduction in stat calls accurately.
Here are the details (from Python/import.c:find_module) assuming that everything has failed to the point of trying for the implicit sys.path importers: stat_info = stat(name) if stat_info.exists and stat_info.is_dir: if stat(name/__init__.py) || stat(name/__init__.pyc): load(name) else: for ext in ('.so', 'module.so', '.py', 'pyc'): # Windows has an extra check for .pyw files. if open(name + ext): load(name) So there are a total of five to six depending on the OS (actually, VMS goes up to eight!) before a search path is considered not to contain a module. And thanks to doing this I realized importlib is not stat'ing the directory first which should fail faster than checking for the __init__ files every time. -Brett
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Brett Cannon wrote:
So there are a total of five to six depending on the OS (actually, VMS goes up to eight!) before a search path is considered not to contain a module.
The windows list is actually going to be slightly different (dir, pyd, py, pyw, py[co]). It looks for .pyd files rather than either flavour of .so file (we stopped allowing .dll files some time back due to the sqlite3 DLL naming conflict). So I believe it is always 5 stat calls on the major platform - dropping the bytecode filename check saves 20% of them on a miss. I'm not convinced that saving is worth the hassle of incurring a whole pile of subtle backward compatibility problems. It seems better to say "Python doesn't create in-place bytecode files anymore, but if you arrange to put them there yourself we'll still read them". I certainly wouldn't support removing the feature without some solid benchmarks to say that it really is going to significantly speed up typical import times for non-trivial modules (given that part of the definition of "non-trivial" is "significant amounts of code to be run when imported", that bar is likely to be a tough one to clear).
And thanks to doing this I realized importlib is not stat'ing the directory first which should fail faster than checking for the __init__ files every time.
That would explain why we had different ideas as to what the interpreter was doing :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan a écrit :
Another option is to remove bytecode-only support from the default filesystem importer, but keep it for zipimport (since the stat call savings don't apply in the latter case).
bytecode-only in a zip is used by py2exe, cx_freeze and the like, for space reasons. Disabling it would probably hurt them. However, making a difference between zipimport and the filesystem importer means the application will stop working if I unzip the library zip file, which is surprising. Unzipping the zip file can be handy when debugging a bug caused by a forgotten module. Cheers, Baptiste
Le Sun, 28 Feb 2010 21:45:56 +0100, Baptiste Carvello a écrit :
bytecode-only in a zip is used by py2exe, cx_freeze and the like, for space reasons. Disabling it would probably hurt them.
Source code compresses quite well. I'm not sure it would make much of a difference. AFAIR, when you create a py2exe distribution, what takes most of the place is the interpreter itself as well as any big third-party C libraries such as wxWidgets. Regards Antoine.
Antoine Pitrou a écrit :
Le Sun, 28 Feb 2010 21:45:56 +0100, Baptiste Carvello a écrit :
bytecode-only in a zip is used by py2exe, cx_freeze and the like, for space reasons. Disabling it would probably hurt them.
Source code compresses quite well. I'm not sure it would make much of a difference.
I did a quick check on the stdlib: a zip with .py and .pyc is about 80% bigger than one with .pyc only. If you use only the bytecode, this can be seen as waisted space. On the other hand, if you ever need to debug the application, source is very handy... Anyway, I'm a bit worried if bytecode-only is disabled from zipimport without some input from the developpers of py2exe/cx_freeze/etc, as they are big users of it. Cheers, Baptiste
Le Mon, 01 Mar 2010 09:09:09 +0100, Baptiste Carvello <baptiste13z@free.fr> a écrit :
I did a quick check on the stdlib: a zip with .py and .pyc is about 80% bigger than one with .pyc only. If you use only the bytecode, this can be seen as waisted space. On the other hand, if you ever need to debug the application, source is very handy...
My point is that the wasted size compared to the total bundle size (with interpreter and 3rd party C libs) would probably be small.
Anyway, I'm a bit worried if bytecode-only is disabled from zipimport without some input from the developpers of py2exe/cx_freeze/etc, as they are big users of it.
Granted. Regards Antoine.
On Sun, Feb 28, 2010 at 09:45:56PM +0100, Baptiste Carvello wrote:
However, making a difference between zipimport and the filesystem importer means the application will stop working if I unzip the library zip file, which is surprising. Unzipping the zip file can be handy when debugging a bug caused by a forgotten module.
That difference exists already, the zipimporter will happily run .pyo files inside the zipfile even when you're not running with -O or PYTHONOPTIMIZE. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
On Sun, Feb 28, 2010 at 12:45, Baptiste Carvello <baptiste13z@free.fr>wrote:
Nick Coghlan a écrit :
Another option is to remove bytecode-only support from the default filesystem importer, but keep it for zipimport (since the stat call savings don't apply in the latter case).
bytecode-only in a zip is used by py2exe, cx_freeze and the like, for space reasons. Disabling it would probably hurt them.
However, making a difference between zipimport and the filesystem importer means the application will stop working if I unzip the library zip file, which is surprising. Unzipping the zip file can be handy when debugging a bug caused by a forgotten module.
Is it really that hard to unzip a bunch of .pyc files, modify what you need to, and then zip it back up? And if you are given a zip file of only .pyc files you can't really debug anything anyway. -Brett
Cheers, Baptiste
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
Brett Cannon a écrit :
However, making a difference between zipimport and the filesystem importer means the application will stop working if I unzip the library zip file, which is surprising. Unzipping the zip file can be handy when debugging a bug caused by a forgotten module.
Is it really that hard to unzip a bunch of .pyc files, modify what you need to, and then zip it back up? And if you are given a zip file of only .pyc files you can't really debug anything anyway.
Well, this is a micro-use-case, I admit, I only mention it because it's something I've really done. It's only useful for debugging the building process, not the application (so I do have the source at hand), and the only reason for not rezipping is to test more quickly. I can definitely live without it! Cheers, Baptiste
Nick Coghlan wrote:
Michael Foord wrote:
Can't it look for a .py file in the source directory first (1st stat)? When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check). Or am I missing a subtlety? The problem is doing this little dance for every path on sys.path.
To unpack this a little bit for those not quite as familiar with the import system (and to make it clear for my own benefit!): for a top-level module/package, each path on sys.path needs to be eliminated as a possible location before the interpreter can move on to check the next path in the list.
So the important number is the number of stat calls on a "miss" (i.e. when the requested module/package is not present in a directory). Currently, with builtin support for bytecode only files, there are 3 checks (package directory, py source file, pyc/pyo bytecode file) to be made for each path entry.
The PEP proposes to reduce that to only two in the case of a miss, by checking for the cached pyc only if the source file is present (there would still be three checks for a "hit", but that only happens at most once per module lookup).
While the PEP is right in saying that a bytecode-only import hook could be added, I believe it would actually be a little tricky to write one that didn't severely degrade the performance of either normal imports or bytecode-only imports. Keeping it in the core import, but turning it off by default seems much less likely to have unintended performance consequences when it is switched back on.
Another option is to remove bytecode-only support from the default filesystem importer, but keep it for zipimport (since the stat call savings don't apply in the latter case).
What if ... a bytecode-only mode is triggered by "__main__" loading from a bytecode file, otherwise the .py files are needed and are checked to make sure the bytecode files are current. Ron
On Mon, Mar 1, 2010 at 08:30, Ron Adam <rrr@ronadam.com> wrote:
Nick Coghlan wrote:
Michael Foord wrote:
Can't it look for a .py file in the source directory first (1st stat)?
When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check). Or am I missing a subtlety?
The problem is doing this little dance for every path on sys.path.
To unpack this a little bit for those not quite as familiar with the import system (and to make it clear for my own benefit!): for a top-level module/package, each path on sys.path needs to be eliminated as a possible location before the interpreter can move on to check the next path in the list.
So the important number is the number of stat calls on a "miss" (i.e. when the requested module/package is not present in a directory). Currently, with builtin support for bytecode only files, there are 3 checks (package directory, py source file, pyc/pyo bytecode file) to be made for each path entry.
The PEP proposes to reduce that to only two in the case of a miss, by checking for the cached pyc only if the source file is present (there would still be three checks for a "hit", but that only happens at most once per module lookup).
While the PEP is right in saying that a bytecode-only import hook could be added, I believe it would actually be a little tricky to write one that didn't severely degrade the performance of either normal imports or bytecode-only imports. Keeping it in the core import, but turning it off by default seems much less likely to have unintended performance consequences when it is switched back on.
Another option is to remove bytecode-only support from the default filesystem importer, but keep it for zipimport (since the stat call savings don't apply in the latter case).
What if ... a bytecode-only mode is triggered by "__main__" loading from a bytecode file, otherwise the .py files are needed and are checked to make sure the bytecode files are current.
That's way too magical for my tastes, especially if you mess up and accidentally leave behind a __main__.pyc after moving the __main__.py file. -Brett
Ron
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
Ron Adam wrote:
What if ... a bytecode-only mode is triggered by "__main__" loading from a bytecode file, otherwise the .py files are needed and are checked to make sure the bytecode files are current.
That would preclude having a bytecode-only library that could be used by a sourceful program. Such a situation might arise if you have an application with a scripting interface that is used by importing stuff from the application's internal libraries. -- Greg
On Feb 28, 2010, at 11:07 PM, Nick Coghlan wrote:
While the PEP is right in saying that a bytecode-only import hook could be added, I believe it would actually be a little tricky to write one that didn't severely degrade the performance of either normal imports or bytecode-only imports. Keeping it in the core import, but turning it off by default seems much less likely to have unintended performance consequences when it is switched back on.
Except that even users of bytecode-only imports probably don't want or care about that for *every* package directory. So really, all-or-nothing hits them too. One option to help with that is to have a flag or marker in a package's __init__.py that signals pyc-only imports for that package directory. It's getting complicated again. ;)
Another option is to remove bytecode-only support from the default filesystem importer, but keep it for zipimport (since the stat call savings don't apply in the latter case).
I'd be okay with this, but even here I'd argue that it would be fine to require the source files by default. The primary use case I've seen mentioned for pyc-only imports is to make it more difficult for users to accidentally shoot themselves in the foot. I think the very presence of a zip file for importing is enough without the extra step of removing the source. But that's just me. :) -Barry
Thanks everybody for providing great input on this aspect of the PEP. I've updated the open issues section to include a list of the possible resolutions for bytecode-only imports. Unless anybody has more ideas, it might just be time to get a BDFL pronouncement. -Barry
On Tue, 2 Mar 2010 11:59:55 am Barry Warsaw wrote:
Thanks everybody for providing great input on this aspect of the PEP. I've updated the open issues section to include a list of the possible resolutions for bytecode-only imports. Unless anybody has more ideas, it might just be time to get a BDFL pronouncement.
Please excuse me if these minor points have already been discussed, but I couldn't see them in the PEP. (1) What happens if the __cache__ directory doesn't exist and the enclosing directory is unwriteable, or if it does exist, but is unreadable? I expect that the byte code files will simply not be created, and everything will continue without them. (2) Presumably this only effects imports, not running python source code as a script. If I do this: python myscript.py from the shell, I would expect that no __cache__ directory will be created, just like today. BTW, you have some sort of automated warning in the PEP: System Message: WARNING/2 (pep-3147.txt, line 237) Title underline too short. http://www.python.org/dev/peps/pep-3147/#id47 -- Steven D'Aprano
On Tue, Mar 2, 2010 at 02:06, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, 2 Mar 2010 11:59:55 am Barry Warsaw wrote:
Thanks everybody for providing great input on this aspect of the PEP. I've updated the open issues section to include a list of the possible resolutions for bytecode-only imports. Unless anybody has more ideas, it might just be time to get a BDFL pronouncement.
Please excuse me if these minor points have already been discussed, but I couldn't see them in the PEP.
(1) What happens if the __cache__ directory doesn't exist and the enclosing directory is unwriteable, or if it does exist, but is unreadable?
I expect that the byte code files will simply not be created, and everything will continue without them.
(2) Presumably this only effects imports, not running python source code as a script. If I do this:
python myscript.py
from the shell, I would expect that no __cache__ directory will be created, just like today.
BTW, you have some sort of automated warning in the PEP:
System Message: WARNING/2 (pep-3147.txt, line 237) Title underline too short.
It's now fixed. Barry forgot to run the Makefile for the PEPs before checking in. Shame! =) -Brett
-- Steven D'Aprano _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Mar 02, 2010, at 09:06 PM, Steven D'Aprano wrote:
(1) What happens if the __cache__ directory doesn't exist and the enclosing directory is unwriteable, or if it does exist, but is unreadable?
I expect that the byte code files will simply not be created, and everything will continue without them.
s/__cache__/__pycache__ but yes, just as it does today.
(2) Presumably this only effects imports, not running python source code as a script. If I do this:
python myscript.py
from the shell, I would expect that no __cache__ directory will be created, just like today.
Correct. -Barry
Barry Warsaw wrote:
Thanks everybody for providing great input on this aspect of the PEP. I've updated the open issues section to include a list of the possible resolutions for bytecode-only imports. Unless anybody has more ideas, it might just be time to get a BDFL pronouncement.
I think the benchmarking in the bytecode-only section is still too weak. "evidence shows that the extra stats can be fairly costly to start up time" isn't a valid justification for breaking working code. Doing 4 stat calls instead of 5 on a directory miss just doesn't excite me very much without some genuine benchmarks across different operating systems and filesystems showing that reducing the number of stat calls by at best 20% will result in a measurable reduction in import times for real modules (where we can expect the import time to be dominated by the execution of the actual module code rather than the time needed to find that code in the first place). Using the sample numbers Robert Collins posted: # Startup time for bzr (cold cache): $ drop-caches $ time bzr --no-plugins revno 5061 real 0m8.875s user 0m0.210s sys 0m0.140s # Hot cache $ time bzr --no-plugins revno 5061 real 0m0.307s user 0m0.250s sys 0m0.040s $ strace -c bzr --no-plugins revno 5061 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 56.34 0.040000 76 527 read 28.98 0.020573 9 2273 1905 open 14.43 0.010248 14 734 625 stat 0.15 0.000107 0 533 fstat hot cache: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.10 0.000368 92 4 getdents 19.49 0.000159 0 527 read 16.91 0.000138 1 163 munmap 10.05 0.000082 2 54 mprotect 8.46 0.000069 0 2273 1905 open 0.00 0.000000 0 8 write 0.00 0.000000 0 367 close 0.00 0.000000 0 734 625 stat Assuming all those stat errors are misses from the import system, we're looking at reducing that 625 figure down to 500: 125 fewer failed calls. With a hot cache, the impact is too small for strace to even measure. With a cold cache, it is 1.75 milliseconds: only 1.25% of the system time consumed in the script's execution, and not even registering relative to the 9 second wall clock time. Without significant measurable performance gains, a mere aesthetic preference isn't enough to justify inflicting subtle breakage on even a small subset of our users. Even aside from the issue of a lack of benchmarks to justify the breakage, bytecode only imports *cannot* legitimately be broken without at least one release where they generate Deprecation Warnings. Cheers, Nick. P.S. I actually started this thread as a +0 to the idea of dropping bytecode only imports. Over the course of the discussion I've shifted to a firm -1 in the absence of some proper comparative benchmarks to justify the change in semantics. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Tue, Mar 2, 2010 at 5:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. I actually started this thread as a +0 to the idea of dropping bytecode only imports. Over the course of the discussion I've shifted to a firm -1 in the absence of some proper comparative benchmarks to justify the change in semantics.
FWIW, I started at -1 and am still -1. I think the PEP is overreaching in this aspect; it does not serve the stated purpose of the PEP to make life easier for distros that share code between Python versions. -- --Guido van Rossum (python.org/~guido)
On Mar 02, 2010, at 09:34 AM, Guido van Rossum wrote:
On Tue, Mar 2, 2010 at 5:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. I actually started this thread as a +0 to the idea of dropping bytecode only imports. Over the course of the discussion I've shifted to a firm -1 in the absence of some proper comparative benchmarks to justify the change in semantics.
FWIW, I started at -1 and am still -1. I think the PEP is overreaching in this aspect; it does not serve the stated purpose of the PEP to make life easier for distros that share code between Python versions.
I think that's fair, and just the guidance I'm looking for. By now you understand the pros and cons, so if this is a pronouncement, I will cement it into the PEP. -Barry
On Tue, Mar 2, 2010 at 11:52 AM, Barry Warsaw <barry@python.org> wrote:
On Mar 02, 2010, at 09:34 AM, Guido van Rossum wrote:
On Tue, Mar 2, 2010 at 5:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. I actually started this thread as a +0 to the idea of dropping bytecode only imports. Over the course of the discussion I've shifted to a firm -1 in the absence of some proper comparative benchmarks to justify the change in semantics.
FWIW, I started at -1 and am still -1. I think the PEP is overreaching in this aspect; it does not serve the stated purpose of the PEP to make life easier for distros that share code between Python versions.
I think that's fair, and just the guidance I'm looking for. By now you understand the pros and cons, so if this is a pronouncement, I will cement it into the PEP.
Yes, and thanks! -- --Guido van Rossum (python.org/~guido)
Floris Bruynooghe wrote:
Can't it look for a .py file in the source directory first (1st stat)? When it's there check for the .pyc in the cache directory (2nd stat, magic number encoded in filename), if it's not check for .pyc in the source directory (2nd stat + read for magic number check).
Yes, although that would then incur higher stat overheads for people distributing .pyc files. There doesn't seem to be a way of pleasing everyone. This is all assuming that the extra stat calls are actually a problem. Does anyone have any evidence that they would really take significant time compared to loading the module? Once you've looked for one file in a given directory, looking for another one in the same directory ought to be quite fast, since all the relevant directory blocks will be in the filesystem cache. -- Greg
On Mon, 2010-03-01 at 12:35 +1300, Greg Ewing wrote:
Yes, although that would then incur higher stat overheads for people distributing .pyc files. There doesn't seem to be a way of pleasing everyone.
This is all assuming that the extra stat calls are actually a problem. Does anyone have any evidence that they would really take significant time compared to loading the module? Once you've looked for one file in a given directory, looking for another one in the same directory ought to be quite fast, since all the relevant directory blocks will be in the filesystem cache.
We've done a bunch of testing in bzrlib. Basic things are: - statting /is/ expensive *if* you don't use the result. - loading code is the main cost *once* you have a hot disk cache Specifically, stats for files that are *not present* incur page-in costs for the dentries needed to determine the file is absent. In the special case of probing for $name.$ext1, ...$ext2, ...$ext3, you generally hit the same pages and don't incur additional page in costs. (you'll hit the same page in most file systems when you look for the second and third entries). In most file systems stats for files that *are present* also incur a page-in for the inode of the file. If you then do not read the file, this is I/O that doesn't really gain anything. Being able to disable .py file usage completely - so that only foo.pyc and foo/__init__.pyc are probed for, could have a very noticable change in the cold cache startup time. # Startup time for bzr (cold cache): $ drop-caches $ time bzr --no-plugins revno 5061 real 0m8.875s user 0m0.210s sys 0m0.140s # Hot cache $ time bzr --no-plugins revno 5061 real 0m0.307s user 0m0.250s sys 0m0.040s (revno is a small command that reads a small amount of data - just enough to trigger demand loading of the core repository layers and so on). strace timings for those two operations: cold cache: $ strace -c bzr --no-plugins revno 5061 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 56.34 0.040000 76 527 read 28.98 0.020573 9 2273 1905 open 14.43 0.010248 14 734 625 stat 0.15 0.000107 0 533 fstat ... hot cache: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.10 0.000368 92 4 getdents 19.49 0.000159 0 527 read 16.91 0.000138 1 163 munmap 10.05 0.000082 2 54 mprotect 8.46 0.000069 0 2273 1905 open 0.00 0.000000 0 8 write 0.00 0.000000 0 367 close 0.00 0.000000 0 734 625 stat ... Cheers, Rob
Robert Collins wrote:
In the special case of probing for $name.$ext1, ...$ext2, ...$ext3, you generally hit the same pages and don't incur additional page in costs.
So then looking for a .pyc alongside a .py or vice versa should be almost free, and we shouldn't be worrying about it.
hot cache: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.10 0.000368 92 4 getdents 0.00 0.000000 0 734 625 stat
Further supporting the idea that stat calls are negligible once the cache is warmed up. -- Greg
On Sun, Feb 28, 2010 at 16:31, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
Robert Collins wrote:
In the special case of probing for $name.$ext1, ...$ext2, ...$ext3, you generally hit the same pages and don't incur additional page in costs.
So then looking for a .pyc alongside a .py or vice versa should be almost free, and we shouldn't be worrying about it.
But that is making the assumption that all filesystems operate this way (.e.g does NFS have the same performance characteristics?).
hot cache:
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.10 0.000368 92 4 getdents 0.00 0.000000 0 734 625 stat
Further supporting the idea that stat calls are negligible once the cache is warmed up.
But that's the point: once it's warmed up. This is not the case when executing a script once every once in a while compared to something bzr where you are most likely going to execute the command multiple times within a small timeframe. -Brett
-- Greg _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Sun, 2010-02-28 at 18:11 -0800, Brett Cannon wrote: ...
So then looking for a .pyc alongside a .py or vice versa should be almost free, and we shouldn't be worrying about it.
But that is making the assumption that all filesystems operate this way (.e.g does NFS have the same performance characteristics?).
NFS doesn't cache pages, rather it caches individual entries. I do not know if adjacent data is pre-populated into an NFS client cache. I rather suspect not, but the general point is true - many filesystems cache by pages, not all. And some have unsorted lists or might have hash tables rather than b-trees or sorted lists, where locality of reference doesn't help at all. (VFAT is unsorted list, IIRC).
hot cache: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.10 0.000368 92 4 getdents
0.00 0.000000 0 734 625 stat
Further supporting the idea that stat calls are negligible once the cache is warmed up.
But that's the point: once it's warmed up. This is not the case when executing a script once every once in a while compared to something bzr where you are most likely going to execute the command multiple times within a small timeframe.
bzr would /love/ cold cache times to come down. They are one of the most glaring performance differences between a large C program, and a large python program. Even though the second run is fast, it hurts the first time you run 'bzr st'. (**) -Rob (**) To the extent that I've seriously considered an import hook to disable normal imports under bzrlib, and special case how the search works so that we only load pyc's and assume all imports are absolute.
On Feb 28, 2010, at 02:51 PM, Greg Ewing wrote:
A solution might be to look for the presence of the cache directory, and only look for a .pyc in the source directory if there is no cache directory. Testing for the cache directory would only have to be done once per package and the result remembered, so it would add very little overhead.
I think the other thing that bothers me about continuing to support pyc-only imports, is that people will then want tools to create them. Right now, it's probably just as easy as byte-compiling everything, then finding the .py files and removing them. After PEP 3147 is implemented, and the default, you'll have to byte-compile the files, then find the pycs in the __pycache__ directory, move them up a level and rename them. Then of course remove the .py files. It's not insurmountable of course, I think if we support pyc-only imports, people are rightly going to want us to write and support the tool to create those imports. -Barry
On Tue, 2 Mar 2010 11:41:52 am Barry Warsaw wrote:
After PEP 3147 is implemented, and the default, you'll have to byte-compile the files, then find the pycs in the __pycache__ directory, move them up a level and rename them. Then of course remove the .py files.
It's not insurmountable of course, I think if we support pyc-only imports, people are rightly going to want us to write and support the tool to create those imports.
Surely that's a job for a tiny Python script, or even a shell script? It doesn't sound hard, not from the description given. I imagine there will be recipes on ActiveState quite quickly, and if there isn't, that would be good evidence that demand for the feature is low. -- Steven D'Aprano
On Sat, 27 Feb 2010 09:09:26 am Brett Cannon wrote:
I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else.
You say that as if it were a bad thing. *wink* Personally, I can't imagine ever wanting to ship a .pyc module without the .py, but since Python already gives people the opportunity to shoot themselves in the foot, meh, we're all adults here. I do recall a poster on comp.lang.python pulling his hair out over a customer who was too big to fire, but who had the obnoxious habit of making random so-called "fixes" to the poster's .py files, so perhaps byte-code only distribution isn't all bad. But I don't care much either way. -- Steven D'Aprano
Steven D'Aprano wrote:
On Sat, 27 Feb 2010 09:09:26 am Brett Cannon wrote:
I think it's almost a dis-service to support bytecode-only files as it leads people who are misinformed or simply don't take the time to understand what is contained in a .pyc file into a false sense of security about their code not being easy to examine by someone else.
You say that as if it were a bad thing.
*wink*
Personally, I can't imagine ever wanting to ship a .pyc module without the .py, but since Python already gives people the opportunity to shoot themselves in the foot, meh, we're all adults here. I do recall a poster on comp.lang.python pulling his hair out over a customer who was too big to fire, but who had the obnoxious habit of making random so-called "fixes" to the poster's .py files, so perhaps byte-code only distribution isn't all bad.
I think the use case of "keep the user from fiddling casually with our application" is a valid one. There's a fairly vast difference between "open source file, edit code, hit save" and "decompile pyc, open decompiled source file, edit code, save next to pyc with correct name". The former makes it easy for folks that know just enough to be dangerous to get themselves in trouble. The latter raises the bar far enough that people with the ability to do it should also know better than to try (or at least, not to call the support line when it doesn't work). I do like the idea of pulling .pyc only imports out into a separate importer, but would go so far as to suggest keeping them as a command line option rather than as a separately distributed module. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Feb 28, 2010, at 01:38 AM, Nick Coghlan wrote:
I think the use case of "keep the user from fiddling casually with our application" is a valid one.
Doesn't the existing support for zipimport satisfy that use case already, and probably better so? Heck you can even name your zip file "application.dat" to really throw naive users off the scent. ;) -Barry
On Feb 27, 2010, at 9:38 AM, Nick Coghlan wrote:
I do like the idea of pulling .pyc only imports out into a separate importer, but would go so far as to suggest keeping them as a command line option rather than as a separately distributed module.
One advantage of doing this as a separately distributed module is that it can have its own ecosystem and momentum. Most projects that want this sort of bundling or packaging really want to be shipped with something like py2exe, and I think the folks who want such facilities would be better served by a nice project website for "python sealer" or "python bundler" rather than obscure directions for triggering the behavior via options or configuration. Making bytecode loading a feature of interpreter startup, whether it's a config file, a command-line option or an environment variable, is not a great idea. For folks that want to ship a self-contained application, any of these would require an additional customization step, where they need to somehow tell their bundled interpreter to load bytecode. For people trying to ship a self-contained and tamper-unfriendly (since even "tamper-resistant" would be overstating things) library to relatively non-technical programmers, it opens the door to a whole universe of confusion and FAQs about why the code didn't load. However bytecode-only code loading is facilitated, it should be possible to bootstrap from a vanilla python interpreter running normally, as you may not know you need to load a bytecode-only package at startup. In the stand-alone case there are already plenty of options, and in the library case, shipping a zip file should be fine, since the __init__.py of your package should be plain-text and also able to trigger the activation of the bytecode-only importer. There are already so many ways to ship bytecode already, it doesn't seem too important to support in this one particular configuration (files in a directory, compiled by just importing them, in the same place as ".py" files). The real problem is providing a seamless transition path for *build* processes, not the Python code itself. Do any of the folks who are currently using this feature have a good idea as to how your build and distribute scripts might easily be updated, perhaps by a 2to3 fixer?
Steven D'Aprano <steve@pearwood.info> writes:
Personally, I can't imagine ever wanting to ship a .pyc module without the .py, but since Python already gives people the opportunity to shoot themselves in the foot, meh, we're all adults here.
Not sure I've seen it mentioned in this thread, but for myself, I've certainly used (indirectly) such a distribution many times when packaging applications with py2exe for installation on Windows clients. That puts all the pyc files into a single support zip file from which the application runs. That seems a perfectly useful use case, and not due to any issues with security/obfuscation. The matching interpreter is being packaged with the application, so there's no version worries with the pyc. The files are internal to a zip, so why complicate things with recompiling and writing locally on the user's machine, particularly when on newer versions of Windows the installation directory might not be writable anyway. As long as executing from pyc files continues to work, presumably py2exe can be updated to collect those files from any new cache location during the build process. But I do think it's useful to continue to support executing them directly outside of any new cache location, which it sounds like is the direction being taken. -- David
participants (18)
-
Antoine Pitrou
-
Baptiste Carvello
-
Barry Warsaw
-
Brett Cannon
-
David Bolen
-
Doug Hellmann
-
Floris Bruynooghe
-
Glenn Linderman
-
Glyph Lefkowitz
-
Greg Ewing
-
Guido van Rossum
-
Ian Bicking
-
Michael Foord
-
Nick Coghlan
-
Robert Collins
-
Ron Adam
-
Steven D'Aprano
-
Tres Seaver