disabling .pyc and .pyo files
data:image/s3,"s3://crabby-images/7511c/7511cb6ac88e7e690b26768000f9814262d0518d" alt=""
Hello there. We have a large project involving multiple perforce branches of hundreds of .py files each. Although we employ our own import mechanism for the bulk of these files, we do use the regular import mechanism for an essential core of them. Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. This can happen for a variety of reasons, but most often it occurs when .py files are being removed, or moved in the hierarchy. The problem is that the application will happily load and import an orphaned .pyo file, even though the .py file has gone or moved. I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files. I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified. This will ensure that the application will execute only the code represented by the checked-out .py files. But it occurred to me that this functionality might be of interest to other people than just us. I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time. Do you think that such a command line option would be useful for Python at large? Cheers, Kristján
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Agreed. I wonder if this functionality ought to be opt-in instead of opt-out? The only use cases I am aware of are software vendors who don't want to distribute their source (a near-extinct breed for sure...) or people with absurdly small disks (ditto). 2009/12/8 Jesse Noller <jnoller@gmail.com>:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Tue, Dec 8, 2009 at 11:34, Raymond Hettinger <python@rcn.com> wrote:
Another way that a sys.dont_read_bytecode flag would be helpful is for VMs that don't use Python bytecode (e.g. Jython). They could set this flag to True by default which allows code to introspect on the VM to see if it is using bytecode or not. Plus it would let importlib easily skip bytecode usage on VMs that don't support it instead of trying to come up with some heuristic to pick up on that fact (I have not figured that one out yet, but Jython folk were thinking about having marshal.loads() always throw an exception). -Brett
data:image/s3,"s3://crabby-images/b3054/b3054acc16151b5d3e6c737fd426ff8c1e6bef92" alt=""
On Tue, Dec 8, 2009 at 11:51 AM, Brett Cannon <brett@python.org> wrote:
It would also be useful when benchmarking multiple iterations of the same VM. I've considered implementing something like this for Unladen Swallow so that we could more effectively isolate the running binary from global state (with a sys.dont_read_bytecode command-line flag doing for bytecode files what -E does for environment variables). +1 for this in mainline. Collin Winter
data:image/s3,"s3://crabby-images/531fa/531faa1b3ec2e8f6729044548e34b79f60355d01" alt=""
Kristján Valur Jónsson wrote:
Yes, this is already implemented (as of Python 2.6), see -B option: http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options
data:image/s3,"s3://crabby-images/020c5/020c52d168c677dce7de1e2579e27fe444fde171" alt=""
Guido van Rossum wrote:
This would be quite nice for us. In our case we have been bit several times during refactoring. You move one file, but your test suite still passes because .pyc is still around. I think having it be opt-in would be nice. I do think that the standard py2exe code generates a library.zip that only has .pyc or .pyo files (and no .py files). It isn't that we would care if they were present, but I suppose it makes the final .zip file smaller and faster to load? Whatever flag is available, though, I'm sure py2exe could be taught to pass it. John =:->
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
John Arbash Meinel wrote:
Whatever flag is available, though, I'm sure py2exe could be taught to pass it.
I'm a bit worried about the idea of adding a flag that is required to turn on functionality that was previously available without any flag. It could make things awkward for launcher scripts that are agnostic about the exact version of Python being used. -- Greg
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Kristján Valur Jónsson <kristjan@ccpgames.com> writes:
Yes, I think Python users would benefit from having the above behaviour be opt-in. I suggest: * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the interpreter follows the current behaviour. If ‘False’, any bytecode file satisfies an import only if it has a corresponding source file (where “corresponding” means “this source file would, if compiled, result in a bytecode file replacing this one”). I suggest this attribute should be implemented as ‘True’ by default (to match current behaviour), then switched to ‘False’ by default as soon as feasible. * The ‘PYTHONIMPORTORPHANEDBYTECODE’ environment variable, when set, causes the interpreter to set the above option ‘True’. * The ‘-b’ option to the interpreter command-line sets the above option ‘True’. -- \ “I have yet to see any problem, however complicated, which, | `\ when you looked at it in the right way, did not become still | _o__) more complicated.” —Paul Anderson | Ben Finney
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
Ben Finney wrote:
Agreed. This has bitten me, too. Often when it's a permissions problem where another user has created the .pyc file and I can't overwrite it (this on Windows).
I agree with this in principle, but I don't see how you're going to implement it. In order to actually check this condition, aren't you going to have to compile the source code anyway? If so, just skip the bytecode file. Although I guess you could store a hash of the source in the compiled file, or other similar optimizations.
Sounds good to me. Eric.
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Eric Smith <eric@trueblade.com> writes:
Thanks.
You seem to be seeing something I was careful not to write. The check is: this source file would, if compiled, result in a bytecode file replacing this one Nowhere there is there anything about the resulting bytecode files being equivalent. I'm limiting the check only to whether the resulting bytecode file would *replace* the existing bytecode file. This doesn't require knowing anything at all about the contents of the current bytecode file; indeed, my intention was to phrase it so that it's checked before bothering to open the existing bytecode file. Is there a better term for this? I'm not well-versed enough in the Python import internals to know. -- \ “Philosophy is questions that may never be answered. Religion | `\ is answers that may never be questioned.” —anonymous | _o__) | Ben Finney
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
If there was a corresponding source file, it would have been found first -- and the bytecode file would be used *if* it matches the source file (by comparing a timestamp in the bytecode file's header to the actual mtime of the source file). So I'm not sure what there is to do apart from *not* using "lone" bytecode files. (The latter was actually added as a feature at some point so I betcha it's easy to make it conditional on a flag.) -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Guido van Rossum <guido@python.org> writes:
Right, that's what I thought. I was only looking for a way to say “only use a bytecode file if the corresponding source code file exists”, and then trying to define “corresponding source code file”. It appears that all I'm doing is confusing the issue, probably because my understanding of the terminology is fuzzy. I hope someone else can word it better, so the question of “which file, exactly, are we saying must exist?” is well answered.
I hope your instinct is right, and I betcha it is too. -- \ “Intellectual property is to the 21st century what the slave | `\ trade was to the 16th.” —David Mertz | _o__) | Ben Finney
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Ben Finney wrote:
As Guido said, the check goes the other way: the interpreter looks for source files first, and if it doesn't find one, only then does it look for orphaned bytecode files (pyo/pyc). The check for a corresponding bytecode files after a source file has actually been found follows a different path through the import code. Since the two features are somewhat orthogonal, slicing out the check for orphaned bytecode files while keeping the check for a cached bytecode file should be fairly straightforward. Fair warning to anyone that implements this - expect to be updating quite a few parts of the test suite. The runpy, command line, import and zipimport tests would all need to be updated to make sure they were respecting the flag (and probably the importlib tests as well, at least in Py3k). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
Just a data point: I reversed that order in importlib to match mental semantics.
Yep for importlib, but I already protect bytecode-writing tests with a decorator for sys.dont_write_bytecode, so doing this for tests that rely on reading bytecode could easily be decorated as well. -Brett
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Guido van Rossum wrote:
Hmm, not as orthogonal as I thought then :P I guess it is a credit to the PEP 302 API that I've never needed to care that zipimport might have the check the other way around :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/d0c04/d0c0419f8e81b90cafa77b77781392d613b474c8" alt=""
On 8 Dec 2009, at 13:44, Ben Finney wrote:
One problem with a sys flag is that it's a global setting. Suppose a package is distributed with only pyc/pyo files, then the top-level __init__.py might flip the switch such that its sub-files can get imported from the pyc/pyo files. But you wouldnt want that flag to persist beyond that. Another idea is to use a new file extension, which isnt the best solution, but allows the creator to explicitly set what behavior they intended for their files: * if a foo.py file exists, then use the existing foo.pyc/pyo as is done today * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but file.pyc/pyo is never used, unlike today) (pyxxx is a placeholder for whatever would be a reasonable name) Jared
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared.grubb@gmail.com> wrote:
I'm not sure that there are any use cases that require using conflicting values of this setting for different packages.
It's a much bigger change, but using a different extension would probably remove the need for a flag. It would also help with some tools that hide .pyc/.pyo files from view (e.g. the typical .svnignore). -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/020c5/020c52d168c677dce7de1e2579e27fe444fde171" alt=""
Guido van Rossum wrote:
Well, during development of your own codebase, where you would like to not import stale .pyc files, but it depends on a 3rd-party library where they only ship you .pyc files. Now if the flag was somehow "for all modules under this namespace" that would easily handle it. Or just living with "if you want to use private 3rd-party libs, then you don't get this support for your own development". (I don't currently do this, but it certainly is *a* use case.) John =:->
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
John Arbash Meinel <john.arbash.meinel@gmail.com> writes:
Or just living with "if you want to use private 3rd-party libs, then you don't get this support for your own development".
FWIW, that's the option I would advocate. The default is to develop and distribute with source; choosing to omit source (or choosing to use such software) is choosing an inferior option for many other reasons as well, so I don't see it as a use case that needs explicit support. -- \ “A learning experience is one of those things that say, “You | `\ know that thing you just did? Don't do that.”” —Douglas Adams, | _o__) 2000-04-05 | Ben Finney
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Wed, Dec 9, 2009 at 11:27, Guido van Rossum <guido@python.org> wrote:
Same here. This is straying into optimizations for the sake of optimizing.
I know some people seem to think pyc/pyo fles are a good way to obfuscate code, but it honestly isn't, IMO. But these people stand the most to lose from us even considering changing default behavior. In a perfect world I would make pyc/pyo files completely optional and only an optimization that could not work w/o the corresponding source. But in a backwards-compatible, paranoid world I would make it an opt-in flag to ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter. -Brett
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Brett Cannon wrote:
People that think it is a good obfuscation trick often don't realise just how powerful Python's introspection features make the disassembly process. When decompiled software includes the original variable names it is a lot easier to follow than the cryptic mass of symbols that is decompiled machine code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Brett Cannon wrote:
In a perfect world I would make pyc/pyo files completely optional and only an optimization that could not work w/o the corresponding source.
That wouldn't be a perfect world in every universe. For example, consider an app installed in an embedded device with limited memory -- the source is never going to be seen by anyone, and all it would do is waste resources. -- Greg
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
If we're positing a perfect world, then all embedded devices would have the source code available and inspectable by any interested user. -- \ “We can't depend for the long run on distinguishing one | `\ bitstream from another in order to figure out which rules | _o__) apply.” —Eben Moglen, _Anarchism Triumphant_, 1999 | Ben Finney
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Jesse Noller <jnoller@gmail.com> writes:
Er, this discussion isn't related to top posting; and it's hardly off-topic to discuss here about importing bytecode files. -- \ “I have had a perfectly wonderful evening, but this wasn't it.” | `\ —Groucho Marx | _o__) | Ben Finney
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Ben Finney wrote:
If we're positing a perfect world, then all embedded devices would have the source code available and inspectable by any interested user.
The source wouldn't have to be on the actual device to make that possible, though. -- Greg
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
2009/12/8 Kristján Valur Jónsson <kristjan@ccpgames.com>
[SNIP]
I looked at the import code and I found that it is trivial to block the
Are you suggesting that the flag turn off reading *period*, or only if no source is available? I think you mean the former while Guido suggested the latter. -Brett
data:image/s3,"s3://crabby-images/7511c/7511cb6ac88e7e690b26768000f9814262d0518d" alt=""
You are right, I was suggesting the former. From what cursory glance I had at the code it seemed simpler to not look for a .pyo file at all, rather than to add a special rule regarding its relation to a .py file. That would also help rule out any timestamp problems. But I‘m happy with whatever way we agree on to solve the „orphaned bytecode“ problem and glad to see that I‘m not the only one experiencing it. Kristján From: bcannon@gmail.com [mailto:bcannon@gmail.com] On Behalf Of Brett Cannon Sent: 8. desember 2009 23:14 To: Kristján Valur Jónsson Cc: python-ideas@python.org Subject: Re: [Python-ideas] disabling .pyc and .pyo files 2009/12/8 Kristján Valur Jónsson <kristjan@ccpgames.com<mailto:kristjan@ccpgames.com>> [SNIP] I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files. I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified. This will ensure that the application will execute only the code represented by the checked-out .py files. But it occurred to me that this functionality might be of interest to other people than just us. I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time. Are you suggesting that the flag turn off reading *period*, or only if no source is available? I think you mean the former while Guido suggested the latter. -Brett
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
2009/12/9 Brett Cannon <brett@python.org>:
I prefer the former as well (don't read any bytecode no matter if source is available or not); clear and simple semantics that are easy to implement.
If that's the rule, what is the point in writing bytecode at all? It'll never be read... Paul.
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Guido van Rossum <guido@python.org> writes:
Almost, but I think many in this discussion are agitating for “don't read orphaned bytecode” to become the default. -- \ “Visitors are expected to complain at the office between the | `\ hours of 9 and 11 a.m. daily.” —hotel, Athens | _o__) | Ben Finney
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
Ben Finney <ben+python@...> writes:
Either to become the default (which might require updates to things like py2exe), or to have a dedicated flag. On the other hand, a flag not to read bytecode /at all/ doesn't seem to have an use case. If you don't want to read any bytecode, don't produce/install it in the first place. Bytecode is useful, it reduces startup times. It's only annoying when the original .py file has been deleted and the obsolete .pyc/.pyo is dangling on disk. cheers Antoine.
data:image/s3,"s3://crabby-images/b3054/b3054acc16151b5d3e6c737fd426ff8c1e6bef92" alt=""
On Wed, Dec 9, 2009 at 7:48 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I gave such a use-case earlier in this thread: """ It would also be useful when benchmarking multiple iterations of the same VM. I've considered implementing something like this for Unladen Swallow so that we could more effectively isolate the running binary from global state (with a sys.dont_read_bytecode command-line flag doing for bytecode files what -E does for environment variables). """ We currently handle this by deleting all .pyc/.pyo files in our library tree, but that gets more expensive the more third-party libraries we bring in for testing, and it's not foolproof. Collin Winter
data:image/s3,"s3://crabby-images/b3054/b3054acc16151b5d3e6c737fd426ff8c1e6bef92" alt=""
On Wed, Dec 9, 2009 at 8:50 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
When changing the bytecode sequence produced by the CPython compiler, it would be useful to make sure that a module is being compiled from scratch (and hence using the new version of the compiler) instead of reusing older bytecode from a .pyc file. You might say that we should simply increase the magic number with each iteration, but I've never found that having to change more code boosts my productivity (especially in cases where changing the magic number is not necessary for compatibility purposes). I understand this may be a fringe use-case, but given the number of optimization projects based on CPython (of which ours is but one), it may still be worth considering. Collin
data:image/s3,"s3://crabby-images/b3054/b3054acc16151b5d3e6c737fd426ff8c1e6bef92" alt=""
On Wed, Dec 9, 2009 at 9:04 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
As I said, "We currently handle this by deleting all .pyc/.pyo files in our library tree, but that gets more expensive the more third-party libraries we bring in for testing, and it's not foolproof." I tire of quoting myself. Collin
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
I don't know about the rest of you, but I think it's PEP time as the conversation seems to have run its course. Looks like the popular options are a flag to not read any bytecode or to only read bytecode if the source is also available. And then whether the default behavior should change or not. 2009/12/8 Kristján Valur Jónsson <kristjan@ccpgames.com>
data:image/s3,"s3://crabby-images/b2508/b2508de2760696928565e68388887d0c83aebbbe" alt=""
Brett Cannon wrote:
A few additional thoughts... Could the existing -B flag be extended to not read bytecode? It might be considered a bug if bytecode is read when the -B option is used to prevent writing of bytecode. Is there a use case for forcing the use of old bytecode? What was the original intent of the -B flag? Would adding a flag to force the writing of bytecode do what is needed? It would generate a noisy fail if a source file is moved or missing and renew old bytecode files. These two together would give read_none and write_all bytecode modes. With the default mode as the write as needed mode. It may be good to have A utility script in the python tools directory to find and/or remove orphaned bytecode. I'm not sure that just deleting all .py(co) files is always a good idea. A more off the wall random thought ... It might be nice in the future to have all bytecode in a single directory or package combined into a single byte_cache.py(co) file. I think Writing all and reading None bytecode files makes good sense in this context. Ron
data:image/s3,"s3://crabby-images/c3f98/c3f987a7ba11f36497b69ec322a872ddd0261072" alt=""
For what it's worth, I've got an entirely different use case than the ones I've seen in this thread so far. I'd like Python to read .pyo files, but not search for .py or .pyc files. This is because we ship a py2exe app in it's "exploded" form, where there is an .exe and a lib/ folder full of .pyos. Purely as an optimization, it'd be nice to not have Python stat for .py and then .pyc for every new import. I remember glancing at Python/import.c and thinking that this could easily be accomplished by allowing the user to customize _PyImport_StandardFiletab at runtime--in fact there is already an PyImport_AppendInittab; it's just not exposed to Python. With a function like imp.set_inittab, I could get what I want with something like imp.set_inittab(['.pyo', 'rb', imp.PY_COMPILED]) And then of course to read just .py files, you could do imp.set_inittab([".py", "U", PY_SOURCE]) - Kevin Kristján Valur Jónsson wrote:
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Tue, Dec 15, 2009 at 14:37, Kevin Watters <kevin.watters@gmail.com>wrote:
The problem with this is I could easily see it leading to tons of people using custom file extensions which seems to just be asking for trouble. Restricting that ability to only people who recompile the interpreter has kept that in check. As for avoiding the extra stat calls, your best bet is to either compile your own version of CPython or use a custom importer (I will be giving a talk on that at PyCon). -Brett
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Agreed. I wonder if this functionality ought to be opt-in instead of opt-out? The only use cases I am aware of are software vendors who don't want to distribute their source (a near-extinct breed for sure...) or people with absurdly small disks (ditto). 2009/12/8 Jesse Noller <jnoller@gmail.com>:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Tue, Dec 8, 2009 at 11:34, Raymond Hettinger <python@rcn.com> wrote:
Another way that a sys.dont_read_bytecode flag would be helpful is for VMs that don't use Python bytecode (e.g. Jython). They could set this flag to True by default which allows code to introspect on the VM to see if it is using bytecode or not. Plus it would let importlib easily skip bytecode usage on VMs that don't support it instead of trying to come up with some heuristic to pick up on that fact (I have not figured that one out yet, but Jython folk were thinking about having marshal.loads() always throw an exception). -Brett
data:image/s3,"s3://crabby-images/b3054/b3054acc16151b5d3e6c737fd426ff8c1e6bef92" alt=""
On Tue, Dec 8, 2009 at 11:51 AM, Brett Cannon <brett@python.org> wrote:
It would also be useful when benchmarking multiple iterations of the same VM. I've considered implementing something like this for Unladen Swallow so that we could more effectively isolate the running binary from global state (with a sys.dont_read_bytecode command-line flag doing for bytecode files what -E does for environment variables). +1 for this in mainline. Collin Winter
data:image/s3,"s3://crabby-images/531fa/531faa1b3ec2e8f6729044548e34b79f60355d01" alt=""
Kristján Valur Jónsson wrote:
Yes, this is already implemented (as of Python 2.6), see -B option: http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options
data:image/s3,"s3://crabby-images/020c5/020c52d168c677dce7de1e2579e27fe444fde171" alt=""
Guido van Rossum wrote:
This would be quite nice for us. In our case we have been bit several times during refactoring. You move one file, but your test suite still passes because .pyc is still around. I think having it be opt-in would be nice. I do think that the standard py2exe code generates a library.zip that only has .pyc or .pyo files (and no .py files). It isn't that we would care if they were present, but I suppose it makes the final .zip file smaller and faster to load? Whatever flag is available, though, I'm sure py2exe could be taught to pass it. John =:->
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
John Arbash Meinel wrote:
Whatever flag is available, though, I'm sure py2exe could be taught to pass it.
I'm a bit worried about the idea of adding a flag that is required to turn on functionality that was previously available without any flag. It could make things awkward for launcher scripts that are agnostic about the exact version of Python being used. -- Greg
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Kristján Valur Jónsson <kristjan@ccpgames.com> writes:
Yes, I think Python users would benefit from having the above behaviour be opt-in. I suggest: * A new attribute ‘sys.import_orphaned_bytecode’. If set ‘True’, the interpreter follows the current behaviour. If ‘False’, any bytecode file satisfies an import only if it has a corresponding source file (where “corresponding” means “this source file would, if compiled, result in a bytecode file replacing this one”). I suggest this attribute should be implemented as ‘True’ by default (to match current behaviour), then switched to ‘False’ by default as soon as feasible. * The ‘PYTHONIMPORTORPHANEDBYTECODE’ environment variable, when set, causes the interpreter to set the above option ‘True’. * The ‘-b’ option to the interpreter command-line sets the above option ‘True’. -- \ “I have yet to see any problem, however complicated, which, | `\ when you looked at it in the right way, did not become still | _o__) more complicated.” —Paul Anderson | Ben Finney
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
Ben Finney wrote:
Agreed. This has bitten me, too. Often when it's a permissions problem where another user has created the .pyc file and I can't overwrite it (this on Windows).
I agree with this in principle, but I don't see how you're going to implement it. In order to actually check this condition, aren't you going to have to compile the source code anyway? If so, just skip the bytecode file. Although I guess you could store a hash of the source in the compiled file, or other similar optimizations.
Sounds good to me. Eric.
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Eric Smith <eric@trueblade.com> writes:
Thanks.
You seem to be seeing something I was careful not to write. The check is: this source file would, if compiled, result in a bytecode file replacing this one Nowhere there is there anything about the resulting bytecode files being equivalent. I'm limiting the check only to whether the resulting bytecode file would *replace* the existing bytecode file. This doesn't require knowing anything at all about the contents of the current bytecode file; indeed, my intention was to phrase it so that it's checked before bothering to open the existing bytecode file. Is there a better term for this? I'm not well-versed enough in the Python import internals to know. -- \ “Philosophy is questions that may never be answered. Religion | `\ is answers that may never be questioned.” —anonymous | _o__) | Ben Finney
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
If there was a corresponding source file, it would have been found first -- and the bytecode file would be used *if* it matches the source file (by comparing a timestamp in the bytecode file's header to the actual mtime of the source file). So I'm not sure what there is to do apart from *not* using "lone" bytecode files. (The latter was actually added as a feature at some point so I betcha it's easy to make it conditional on a flag.) -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Guido van Rossum <guido@python.org> writes:
Right, that's what I thought. I was only looking for a way to say “only use a bytecode file if the corresponding source code file exists”, and then trying to define “corresponding source code file”. It appears that all I'm doing is confusing the issue, probably because my understanding of the terminology is fuzzy. I hope someone else can word it better, so the question of “which file, exactly, are we saying must exist?” is well answered.
I hope your instinct is right, and I betcha it is too. -- \ “Intellectual property is to the 21st century what the slave | `\ trade was to the 16th.” —David Mertz | _o__) | Ben Finney
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Ben Finney wrote:
As Guido said, the check goes the other way: the interpreter looks for source files first, and if it doesn't find one, only then does it look for orphaned bytecode files (pyo/pyc). The check for a corresponding bytecode files after a source file has actually been found follows a different path through the import code. Since the two features are somewhat orthogonal, slicing out the check for orphaned bytecode files while keeping the check for a cached bytecode file should be fairly straightforward. Fair warning to anyone that implements this - expect to be updating quite a few parts of the test suite. The runpy, command line, import and zipimport tests would all need to be updated to make sure they were respecting the flag (and probably the importlib tests as well, at least in Py3k). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
Just a data point: I reversed that order in importlib to match mental semantics.
Yep for importlib, but I already protect bytecode-writing tests with a decorator for sys.dont_write_bytecode, so doing this for tests that rely on reading bytecode could easily be decorated as well. -Brett
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Guido van Rossum wrote:
Hmm, not as orthogonal as I thought then :P I guess it is a credit to the PEP 302 API that I've never needed to care that zipimport might have the check the other way around :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/d0c04/d0c0419f8e81b90cafa77b77781392d613b474c8" alt=""
On 8 Dec 2009, at 13:44, Ben Finney wrote:
One problem with a sys flag is that it's a global setting. Suppose a package is distributed with only pyc/pyo files, then the top-level __init__.py might flip the switch such that its sub-files can get imported from the pyc/pyo files. But you wouldnt want that flag to persist beyond that. Another idea is to use a new file extension, which isnt the best solution, but allows the creator to explicitly set what behavior they intended for their files: * if a foo.py file exists, then use the existing foo.pyc/pyo as is done today * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but file.pyc/pyo is never used, unlike today) (pyxxx is a placeholder for whatever would be a reasonable name) Jared
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared.grubb@gmail.com> wrote:
I'm not sure that there are any use cases that require using conflicting values of this setting for different packages.
It's a much bigger change, but using a different extension would probably remove the need for a flag. It would also help with some tools that hide .pyc/.pyo files from view (e.g. the typical .svnignore). -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/020c5/020c52d168c677dce7de1e2579e27fe444fde171" alt=""
Guido van Rossum wrote:
Well, during development of your own codebase, where you would like to not import stale .pyc files, but it depends on a 3rd-party library where they only ship you .pyc files. Now if the flag was somehow "for all modules under this namespace" that would easily handle it. Or just living with "if you want to use private 3rd-party libs, then you don't get this support for your own development". (I don't currently do this, but it certainly is *a* use case.) John =:->
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
John Arbash Meinel <john.arbash.meinel@gmail.com> writes:
Or just living with "if you want to use private 3rd-party libs, then you don't get this support for your own development".
FWIW, that's the option I would advocate. The default is to develop and distribute with source; choosing to omit source (or choosing to use such software) is choosing an inferior option for many other reasons as well, so I don't see it as a use case that needs explicit support. -- \ “A learning experience is one of those things that say, “You | `\ know that thing you just did? Don't do that.”” —Douglas Adams, | _o__) 2000-04-05 | Ben Finney
data:image/s3,"s3://crabby-images/e87f3/e87f3c7c6d92519a9dac18ec14406dd41e3da93d" alt=""
On Wed, Dec 9, 2009 at 11:27, Guido van Rossum <guido@python.org> wrote:
Same here. This is straying into optimizations for the sake of optimizing.
I know some people seem to think pyc/pyo fles are a good way to obfuscate code, but it honestly isn't, IMO. But these people stand the most to lose from us even considering changing default behavior. In a perfect world I would make pyc/pyo files completely optional and only an optimization that could not work w/o the corresponding source. But in a backwards-compatible, paranoid world I would make it an opt-in flag to ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter. -Brett
participants (19)
-
Antoine Pitrou
-
Ben Finney
-
Brett Cannon
-
Collin Winter
-
Eric Smith
-
geremy condra
-
Greg Ewing
-
Guido van Rossum
-
Jared Grubb
-
Jesse Noller
-
John Arbash Meinel
-
Kevin Watters
-
Kristján Valur Jónsson
-
Nick Coghlan
-
Paul Moore
-
Raymond Hettinger
-
Ron Adam
-
Terry Reedy
-
Todd Whiteman