Hi Craig,
On Tue, Feb 2, 2010 at 4:42 PM, Craig Citro <craigcitro(a)gmail.com> wrote:
>> Done. The diff is at
>> http://codereview.appspot.com/186247/diff2/5014:8003/7002. I listed
>> Cython, Shedskin and a bunch of other alternatives to pure CPython.
>> Some of that information is based on conversations I've had with the
>> respective developers, and I'd appreciate corrections if I'm out of
>> date.
>>
>
> Well, it's a minor nit, but it might be more fair to say something
> like "Cython provides the biggest improvements once type annotations
> are added to the code." After all, Cython is more than happy to take
> arbitrary Python code as input -- it's just much more effective when
> it knows something about types. The code to make Cython handle
> closures has just been merged ... hopefully support for the full
> Python language isn't so far off. (Let me know if you want me to
> actually make a comment on Rietveld ...)
Indeed, you're quite right. I've corrected the description here:
http://codereview.appspot.com/186247/diff2/7005:9001/10001
> Now what's more interesting is whether or not U-S and Cython could
> play off one another -- take a Python program, run it with some
> "generic input data" under Unladen and record info about which
> functions are hot, and what types they tend to take, then let
> Cython/gcc -O3 have a go at these, and lather, rinse, repeat ... JIT
> compilation and static compilation obviously serve different purposes,
> but I'm curious if there aren't other interesting ways to take
> advantage of both.
Definitely! Someone approached me about possibly reusing the profile
data for a feedback-enhanced code coverage tool, which has interesting
potential, too. I've added a note about this under the "Future Work"
section: http://codereview.appspot.com/186247/diff2/9001:10002/9003
Thanks,
Collin Winter
In #7712 I was trying to change regrtest to always run the tests in a
temporary CWD (e.g. /tmp/@test_1234_cwd/).
The patches attached to the issue add a context manager that changes the
CWD, and it works fine when I run ./python -m test.regrtest from trunk/.
However, when I try from trunk/Lib/ it fails with ImportErrors (note
that the latest patch by Florent Xicluna already tries to workaround the
problem). The traceback points to "the_package = __import__(abstest,
globals(), locals(), [])" in runtest_inner (in regrtest.py), and a
"print __import__('test').__file__" there returns 'test/__init__.pyc'.
This can be reproduced quite easily:
trunk$ ./python
Python 2.7a2+ (trunk:77941M, Feb 3 2010, 06:40:49)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> os.getcwd()
'/home/wolf/dev/trunk'
>>> import test
>>> test.__file__ # absolute
'/home/wolf/dev/trunk/Lib/test/__init__.pyc'
>>> os.chdir('/tmp')
>>> test.__file__
'/home/wolf/dev/trunk/Lib/test/__init__.pyc'
>>> from test import test_unicode # works
>>> test_unicode.__file__
'/home/wolf/dev/trunk/Lib/test/test_unicode.pyc'
>>>
[21]+ Stopped ./python
trunk$ cd Lib/
trunk/Lib$ ../python
Python 2.7a2+ (trunk:77941M, Feb 3 2010, 06:40:49)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> os.getcwd()
'/home/wolf/dev/trunk/Lib'
>>> import test
>>> test.__file__ # relative
'test/__init__.pyc'
>>> os.chdir('/tmp')
>>> from test import test_unicode # fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name test_unicode
Is there a reason why in the second case test.__file__ is relative?
On Feb 03, 2010, at 11:07 PM, Nick Coghlan wrote:
>It's also the case that having to run Python to manage my own filesystem
>would very annoying. If a dev has a broken .pyc that prevents the
>affected Python build from even starting how are they meant to use the
>nonfunctioning interpreter to find and delete the offending file? How is
>someone meant to find and delete the .pyc files if they prefer to use a
>graphical file manager over (or in conjunction with) the command line?
I agree. I'd prefer to have a predictable place for the cached files,
independent of having to run Python to tell you where that is.
-Barry
On Jan 31, 2010, at 11:34 PM, Nick Coghlan wrote:
>I must admit I quite like the __pyr__ directory approach as well. Since
>the interpreter knows the suffix it is looking for, names shouldn't
>conflict. Using a single directory allows the name to be less cryptic,
>too (e.g. __pycache__).
Something else that occurs to me; the name of the directory (under
folder-per-folder approach) probably ought to be the same as the name of the
module attribute. There's probably no good reason to make it different, and
making it the same makes the association stronger.
That still gives us plenty of opportunity to bikeshed, but __pycache__ seems
reasonable to me (it's the cache of parsing and compiling the .py file).
-Barry
On Feb 03, 2010, at 11:59 AM, M.-A. Lemburg wrote:
>How about using an optionally relative cache dir setting to let
>the user decide ?
Why do we need that level of flexibility?
-Barry
On Feb 03, 2010, at 01:17 PM, Guido van Rossum wrote:
>Can you clarify? In Python 3, __file__ always points to the source.
>Clearly that is the way of the future. For 99.99% of uses of __file__,
>if it suddenly never pointed to a .pyc file any more (even if one
>existed) that would be just fine. So what's this talk of switching to
>__source__?
Upon further reflection, I agree. __file__ also points to the source in
Python 2.7. Do we need an attribute to point to the compiled bytecode file?
-Barry
On Feb 01, 2010, at 02:04 PM, Paul Du Bois wrote:
>It's an interesting challenge to write the file in such a way that
>it's safe for a reader and writer to co-exist. Like Brett, I
>considered an append-only scheme, but one needs to handle the case
>where the bytecode for a particular magic number changes. At some
>point you'd need to sweep garbage from the file. All solutions seem
>unnecessarily complex, and unnecessary since in practice the case
>should not come up.
I don't think that part's difficult. The byte code's only going to change if
the source file has changed, and in that case, /all/ the byte code in the "fat
pyc" file will be invalidated, so the whole thing can be deleted by the first
writer. I'd worked that out in the original fat pyc version of the PEP.
-Barry
On Feb 01, 2010, at 08:26 AM, Tim Delaney wrote:
>The pyc/pyo files are just an optimisation detail, and are essentially
>temporary. Given that, if they were to live in a single directory, to me it
>seems obvious that the default location for that should be in the system
>temporary directory. I an immediately think of the following advantages:
>
>1. No one really complains too much about putting things in /tmp unless it
>starts taking up too much space. In which case they delete it and if it gets
>reused, it gets recreated.
IIUC the Filesystem Hierarchy Standard correctly, then these files really
should go under /var/cache/python. (Don't ask me where that would be on
non-FHS compliant systems <cough>Windows</cough>). I've explained in other
followups why I don't particularly like separating the source from the cache
files though, but if you wanted a sick approach:
Take the full absolutely path to the .py file, plus the magic number, plus the
time stamp and hash that. Cache the pyc file under /var/cache/python/<hash>.
-Barry
On Jan 31, 2010, at 08:10 PM, Silke von Bargen wrote:
>Martin v. Löwis schrieb:
>> There is also the issue of race conditions with multiple simultaneous
>> accesses. The original format for the PEP had race conditions for
>> multiple simultaneous writers; ZIP will also have race conditions for
>> concurrent readers/writers (as any new writer will have to overwrite
>> the central directory, making the zip file temporarily unavailable -
>> unless they copy it, in which case we are back to writer/writer
>> races).
>>
>> Regards,
>> Martin
>>
>>
>Good point. OTOH the probability for this to happen actually is very small.
And yet, when it does happen, it's probably a monster to debug and defend
against. Unless we have a convincing cross-platform story for preventing
these race conditions, I think a single-file (e.g. zipfile) approach is
infeasible.
-Barry
On approximately 1/30/2010 4:00 PM, came the following characters from
the keyboard of Barry Warsaw:
> When the Python executable is given a `-R` flag, or the environment
> variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
> directory and write a `pyc` file to that directory with the hexlified
> magic number as the base name.
>
After the discussion so far, my opinion is that if the source directory
contains an appropriate python repositiory directory [1], and the
version of Python implements PEP 3147, that there should be no need for
-R or $PYTHONPYR to exist, but that such versions of Python would
simply, and always look in the python repository directory for binaries.
I've reached this conclusion for several reasons/benefits:
1) it makes the rules simpler for people finding the binaries
2) there is no "double lookup" to find a binary at run time
3) if the PEP changes to implement alternatives B or C in [1], then I
hear a large consensus of people that like that behavior, to clean up
the annoying clutter of .pyc files mixed with source.
4) There is no need to add or document the command line option or
environment variable.
[1] Alternative A... source-file-root.pyr, as in the PEP, Alt. B...
source-file-dir/__pyr__ all versions/files in same lookaside directory,
Alt. C... source-file-dir/__pyr_version__, each Python version with
different bytecode would have some sort of version string or magic
number that identifies it, and would look only in that directory for its
.pyc/.pyo files. I prefer C for 4 reasons: 1) easier to blow away one
version; 2) easier to see what that version has compiled; 3) most people
use only one or two versions, so directory proliferation is limited; 4)
even when there are 30 versions of Python, the subdirectories would
contain the same order-of-magnitude count of files as the source
directory for performance issues, if the file system has a knee in the
performance curve as some do.
--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking