
Since we're all talking about making Python faster, I thought I'd drop some previous ideas I've had here in case (1) someone wants to actually do them, and (2) they really are new ideas that haven't failed in the past. Mostly I was thinking about startup time. Here are the list of modules imported on clean startup on my Windows, US-English machine (from -v and cleaned up a bit): import _frozen_importlib import _imp import sys import '_warnings' import '_thread' import '_weakref' import '_frozen_importlib_external' import '_io' import 'marshal' import 'nt' import '_thread' import '_weakref' import 'winreg' import 'zipimport' import '_codecs' import 'codecs' import 'encodings.aliases' import 'encodings' import 'encodings.mbcs' import '_signal' import 'encodings.utf_8' import 'encodings.latin_1' import '_weakrefset' import 'abc' import 'io' import 'encodings.cp437' import 'errno' import '_stat' import 'stat' import 'genericpath' import 'ntpath' import '_collections_abc' import 'os' import '_sitebuiltins' import 'sysconfig' import '_locale' import '_bootlocale' import 'encodings.cp1252' import 'site' Obviously the easiest first thing is to remove or delay unnecessary imports. But a while ago I used a native profiler to trace through this and the most impactful modules were the encodings: import 'encodings.mbcs' import 'encodings.utf_8' import 'encodings.latin_1' import 'encodings.cp437' import 'encodings.cp1252' While I don't doubt that we need all of these for *some* reason, aliases, cp437 and cp1252 are relatively expensive modules to import. Mostly due to having large static dictionaries or data structures generated on startup. Given this is static and mostly read-only information[1], I see no reason why we couldn't either generate completely static versions of them, or better yet compile the resulting data structures into the core binary. ([1]: If being able to write to some of the encoding data is used by some people, I vote for breaking that for 3.6 and making it read-only.) This is probably the code snippet that bothered me the most: ### Encoding table encoding_table=codecs.charmap_build(decoding_table) It shows up in many of the encodings modules, and while it is not a bad function in itself, we are obviously generating a known data structure on every startup. Storing these in static data is a tradeoff between disk space and startup performance, and one I think it likely to be worthwhile. Anyway, just an idea if someone wants to try it and see what improvements we can get. I'd love to do it myself, but when it actually comes to finding time I keep coming up short. Cheers, Steve P.S. If you just want to discuss optimisation techniques or benchmarking in general, without specific application to CPython 3.6, there's a whole internet out there. Please don't make me the cause of a pointless centithread. :)

It doesn't currently end up on disk. Some tables are partially or completely stored on disk as Python source code (some are partially generated from simple rules), but others are generated by inverting those. That process takes time that could be avoided by storing the generated tables, and storing all of it in a format that doesn't require parsing, compiling and executing (such as a native array). Potentially it could be a win all around if we stopped including the (larger) source files, but that doesn't seem like a good idea for maintaining portability to other implementations. The main thought is making the compiler binary bigger to avoid generating encoding tables at startup. Top-posted from my Windows Phone -----Original Message----- From: "francismb" <francismb@email.de> Sent: 1/29/2016 13:56 To: "python-dev@python.org" <python-dev@python.org> Subject: Re: [Python-Dev] More optimisation ideas Hi,
it's really an important trade off? As far a I understand from your email those modules are always being loaded and the final data created. won't the space be there (on mem or disk)? Thanks in advance! francis _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org

On 30 January 2016 at 03:48, Steve Dower <steve.dower@python.org> wrote:
When I last tried to profile startup on Windows (I haven't used Windows for some time now) it seemed that the time was totally dominated by file system access. Essentially the limiting factor was the inordinate number of stat calls and small file accesses. Although this was probably Python 2.x which may not import those particular modules and maybe it depends on virus scanner software etc. Things may have changed now but I concluded that substantive gains could only come from improving FS access. Perhaps something like zipping up the standard library would see a big improvement. -- Oscar

On 29.01.16 19:05, Steve Dower wrote:
$ ./python -m timeit -s "import codecs; from encodings.cp437 import decoding_table" -- "codecs.charmap_build(decoding_table)" 100000 loops, best of 3: 4.36 usec per loop Getting rid from charmap_build() would save you at most 4.4 microseconds per encoding. 0.0005 seconds if you have imported *all* standard encodings! And how you expected to store encoding_table in more efficient way?

On 30Jan2016 0645, Serhiy Storchaka wrote:
Just as happy to be proven wrong. Perhaps I misinterpreted my original profiling and then, embarrassingly, ran with the result for a long time without retesting.
And how you expected to store encoding_table in more efficient way?
There's nothing inefficient about its storage, but as it does not change it would be trivial to store it statically. Then "building" the map is simply obtaining a pointer into an already loaded memory page. Much faster than building it on load, but both are clearly insignificant compared to other factors. Cheers, Steve

On Sat, 30 Jan 2016 at 10:21 Serhiy Storchaka <storchaka@gmail.com> wrote:
Check the archives, but I did trying freezing the entire stdlib and it didn't really make a difference in startup, so I don't know if this still holds true anymore. At this point I think all of our knowledge of what takes the most amount of time during startup is outdated and someone should try to really profile the whole thing to see where the hotspots are (e.g., is it stat calls from imports, is it actually some specific function, is it just so many little things adding up to a big thing, etc.).

Brett tried freezing the entire stdlib at one point (as we do for parts of importlib) and reported no significant improvement. Since that rules out code compilation as well as the OS calls, it'd seem the priority is to execute less code on startup. Details of that work were posted to python-dev about twelve months ago, IIRC. Maybe a little longer. Top-posted from my Windows Phone -----Original Message----- From: "Serhiy Storchaka" <storchaka@gmail.com> Sent: 1/30/2016 10:22 To: "python-dev@python.org" <python-dev@python.org> Subject: Re: [Python-Dev] More optimisation ideas On 30.01.16 18:31, Steve Dower wrote:
AFAIK the most time is spent in system calls like stat or open. Archiving the stdlib into the ZIP file and using zipimport can decrease Python startup time (perhaps there is an open issue about this). _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org

On 30.01.2016 20:15, Steve Dower wrote:
Brett tried freezing the entire stdlib at one point (as we do for parts of importlib) and reported no significant improvement. Since that rules out code compilation as well as the OS calls, it'd seem the priority is to execute less code on startup.
Details of that work were posted to python-dev about twelve months ago, IIRC. Maybe a little longer.
Freezing the entire stdlib does improve the startup time, simply because it removes stat calls, which dominate the startup time at least on Unix. It also allows sharing the stdlib byte code in memory, since it gets stored in static C structs which the OS will happily mmap into multiple processes for you without any additional effort. Our eGenix PyRun does exactly that. Even though the original motivation is a different one, the gained improvement in startup time is a nice side effect: http://www.egenix.com/products/python/PyRun/ Aside: The encodings don't really make much difference here. The dictionaries aren't all that big, so generating them on the fly doesn't really create much overhead. The trade off in terms of maintainability/speed definitely leans toward maintainability. For the larger encoding tables we already have C implementations with appropriate data structures to make lookup speed vs. storage needs efficient. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 31 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

So freezing the stdlib helps on UNIX and not on OS X (if my old testing is still accurate). I guess the next question is what it does on Windows and if we would want to ever consider freezing the stdlib as part of the build process (and if we would want to change the order of importers on sys.meta_path so frozen modules came after file-based ones). On Sun, 31 Jan 2016, 10:43 M.-A. Lemburg <mal@egenix.com> wrote:

On Sun, Jan 31, 2016 at 08:23:00PM +0000, Brett Cannon wrote:
I find that being able to easily open stdlib .py files in a text editor to read the source is extremely valuable. I've learned much more from reading the source than from (e.g.) StackOverflow. Likewise, it's often handy to do a grep over the stdlib. When you talk about freezing the stdlib, what exactly does that mean? - will the source files still be there? - how will this affect people writing patches for bugs? -- Steve

On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
Well, Brett said it would be optional, though perhaps the above paragraph is asking about doing it in our Windows build. But the linux distros might make also use the option if it exists, so the question is very meaningful. However, you'd have to ask the distro if the source would be shipped in the linux case, and I'd guess not in most cases. I don't know about anyone else, but on my own development systems it is not that unusual for me to *edit* the stdlib files (to add debug prints) while debugging my own programs. Freeze would definitely interfere with that. I could, of course, install a separate source build on my dev system, but I thought it worth mentioning as a factor. On the other hand, if the distros go the way Nick has (I think) been advocating, and have a separate 'system python for system scripts' that is independent of the one installed for user use, having the system-only python be frozen and sourceless would actually make sense on a couple of levels. --David

On Feb 01, 2016, at 11:40 AM, R. David Murray wrote:
It's very likely the .py files would still be shipped, but perhaps in a -dev package that isn't normally installed.
I do this too, though usually in a VM or chroot and not in my live system. A very common situation for me though is pdb stepping through my own code and landing in -or passing through- stdlib.
Yep, we've talked about it in Debian-land too, but never quite gotten around to doing anything. Certainly I'd like to see some consistency among Linux distros there (i.e. discussed on linux-sig@). But even with system scripts, I do need to step through them occasionally. If it were a matter of changing a shebang or invoking the script with a different Python (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full unpacked source, that would be fine. Cheers, -Barry

" " == Barry Warsaw <barry@python.org> writes:
>> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: >> I don't know about anyone else, but on my own development >> systems it is not that unusual for me to *edit* the stdlib >> files (to add debug prints) while debugging my own programs. >> Freeze would definitely interfere with that. I could, of >> course, install a separate source build on my dev system, but I >> thought it worth mentioning as a factor. [snip] > But even with system scripts, I do need to step through them > occasionally. If it were a matter of changing a shebang or > invoking the script with a different Python > (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full > unpacked source, that would be fine. If the stdlib were to use implicit namespace packages ( https://www.python.org/dev/peps/pep-0420/ ) and the various loaders/importers as well, then python could do what I've done with an embedded python application for years. Freeze the stdlib (or put it in a zipfile or whatever is fast). Then arrange PYTHONPATH to first look on the filesystem and then look in the frozen/ziped storage. Normally the filesystem part is empty. So, modules are loaded from the frozen/zip area. But if you wanna override one of the frozen modules simply copy one or more .py files onto the file system. I've been doing this only with modules in the global scope. But implicit namespace packages seem to open the door for this with packages. Mike

On Feb 01 2016, mike.romberg@comcast.net wrote:
Presumably that would eliminate the performance advantages of the frozen/zipped storage because now Python would still have to issue all the stat calls to first check for the existence of a .py file. Best, -Nikolaus (No Cc on replies please, I'm reading the list) -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Feb 1, 2016, at 09:59, mike.romberg@comcast.net wrote:
This is a great solution for experienced developers, but I think it would be pretty bad for novices or transplants from other languages (maybe even including Python 2). There are already multiple duplicate questions every month on StackOverflow from people asking "how do I find the source to stdlib module X". The canonical answer starts off by explaining how to import the module and use its __file__, which everyone is able to handle. If we have to instead explain how to work out the .py name from the qualified module name, how to work out the stdlib path from sys.path, and then how to find the source from those two things, with the caveat that it may not be installed at all on some platforms, and how to make sure what they're asking about really is a stdlib module, and how to make sure they aren't shadowing it with a module elsewhere on sys.path, that's a lot more complicated. Especially when you consider that some people on Windows and Mac are writing Python scripts without ever learning how to use the terminal or find their Python packages via Explorer/Finder. And meanwhile, other people would be asking why their app runs slower on one machine than another, because they didn't expect that installing python-dev on top of python would slow down startup. Finally, on Linux and Mac, the stdlib will usually be somewhere that's not user-writable--and we shouldn't expect users to have to mess with stuff in /usr/lib or /System/Library even if they do have sudo access. Of course we could put a "stdlib shadow" location on the sys.path and configure it for /usr/local/lib and /Library and/or for somewhere in -, but that just makes the lookup proceed even more complicated--not to mention that we've just added three stat calls to remove one open, at which point the optimization has probably become a pessimization.

On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote:
Perhaps even easier: start IDLE, hit Alt-M, type in module name as one would import it, click OK. If Python source is available, IDLE will open in an editor window. with the path on the title bar. If we have to instead explain how to work out the .py name
The windows has the path on the title bar, so one can tell what was loaded. IDLE currently uses imp.find_module (this could be updated), with a backup of __import__(...).__file__, so it will load non-stdlib files that can be imported.
Finally, on Linux and Mac, the stdlib will usually be somewhere that's not user-writable
On Windows, this depends on the install location. Perhaps there should be an option for edit-save or view only to avoid accidental changes. -- Terry Jan Reedy

On Feb 1, 2016, at 19:44, Terry Reedy <tjreedy@udel.edu> wrote:
The point of this thread is the suggestion that the stdlib modules be frozen or stored in a zipfile, unless a user modifies things in some way to make the source accessible. So, if a user hasn't done that (which no novice will know how to do), there won't be a path to show in the title bar, so IDLE won't be any more help than the command line. (I suppose IDLE could grow a new feature to look up "associated source files" for a zipped stdlib or something, but that seems like a pretty big new feature.)
The problem is that, if the standard way for users to see stdlib sources is to copy them from somewhere else (like $install/src/Lib) into a stdlib directory (like $install/Lib), then that stdlib directory has to be writable--and on Mac and Linux, it's not.

On 2 February 2016 at 06:39, Andrew Barnert via Python-Dev <python-dev@python.org> wrote:
For folks that *do* know how to use the terminal: $ python3 -m inspect --details inspect Target: inspect Origin: /usr/lib64/python3.4/inspect.py Cached: /usr/lib64/python3.4/__pycache__/inspect.cpython-34.pyc Loader: <_frozen_importlib.SourceFileLoader object at 0x7f0d8d23d9b0> (And if they just want to *read* the source code, then leaving out "--details" prints the full module source, and would work even if the standard library were in a zip archive) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 04.02.2016 14:09, Nick Coghlan wrote:
I want to see and debug also core Python in PyCharm and this is not acceptable. If you want to make it opt-in, fine. But opt-out is a no-go. I have a side-by-side comparison as we use Java and Python in production. It's the *ease of access* that makes Python great compared to Java. @Andrew Even for experienced developers it just sucks and there are more important things to do. Best, Sven

On 2/4/2016 12:18 PM, Sven R. Kunze wrote:
This is completely inadequate as a replacement for loading source into an editor, even if just for reading. First, on Windows, the console defaults to 300 lines. Print more and only the last 300 lines remain. The max is buffer size is 9999. But setting the buffer to that is obnoxious because the buffer is then padded with blank lines to make 9999 lines. The little rectangle that one grabs in the scrollbar is then scaled down to almost nothing, becoming hard to grab. Second is navigation. No Find, Find-next, or Find-all. Because of padding, moving to the unpadded 'bottom of file' is difficult. Third, for a repository version, I would have to type, without error, instead of 'python3', some version of, for instance, some suffix of 'F:/python/dev/35/PcBuild/<I forget>/python_d.exe'. "<I forget>" depends, I believe, on the build options.
I agree that removing stdlib python source files by default is an poor idea. The disk space saved is trivial. So, for me, would be nearly all of the time saving. Over recent versions, more and more source files have been linked to in the docs. Guido recently approved of linking the rest. Removing source contradicts this trend. Easily loading modules, including stdlib modules, into an IDLE Editor Window is a documented feature that goes back to the original commit in Aug 2000. We not not usually break stdlib features without acknowledgement, some decussion, and a positive decision to do so. Someone has already mentioned the degredation of tracebacks. So why not just leave the source files alone in /Lib. As far as I can see, they would not hurt anything At least on Windows, zip files are treated as directories and python35.zip comes before /Lib on sys.path. The Windows installer currently has an option, selected by default I believe, to run compileall. Add to compileall an option to compile all to python35.zip rather than __pycache and and use that in that installer. Even if the zip is including in the installer, compileall-zip + source files would let adventurous people patch their stdlib files. Editing a stdlib file, to see if a confirmed bug disappeared (it did), was how I made my first code contribution. If I had had to download and setup svn and maybe visual c to try a one line change, I would not have done it. -- Terry Jan Reedy

On Thu, Feb 04, 2016 at 07:58:30PM -0500, Terry Reedy wrote:
I agree with Terry. The inspect trick Nick describes above is a great feature to have, but it's not a substitute for opening the source in an editor, not even on OSes where the command line tools are more powerful than Windows' default tools. [...]
I too would be very reluctant to remove the source files from Python by default, but I have an alternative. I don't know if this is a ridiculous idea or not, but now that the .pyc bytecode files are kept in a separate __pycache__ directory, could we freeze that directory and leave the source files available for reading? (I'm not even sure if this suggestion makes sense, since I'm not really sure what "freezing" the stdlib entails. Is it documented anywhere?) -- Steve

On 5 February 2016 at 15:05, Steven D'Aprano <steve@pearwood.info> wrote:
(I'm not even sure if this suggestion makes sense, since I'm not really sure what "freezing" the stdlib entails. Is it documented anywhere?)
It's not particularly well documented - most of the docs you'll find are about freeze utilities that don't explain how they work, or the FrozenImporter, which doesn't explain how to *create* a frozen module and link it into your Python executable. Your approach of thinking of a frozen module as a generated .pyc file that has been converted to a builtin module is a pretty good working model, though. (It isn't *entirely* accurate, but the discrepancies are sufficiently arcane that they aren't going to matter in any case that doesn't involve specifically poking around at the import related attributes). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2/5/2016 9:37 AM, Alexander Walters wrote:
Hmm, the annotated Open Source Definition explicitly states "The program must include source code" -- how did I misinterpret that? Emile http://opensource.org/osd-annotated

On 2/5/2016 10:38 AM, Brett Cannon wrote:
On Fri, 5 Feb 2016 at 10:34 Emile van Sebille <emile@fenx.com <mailto:emile@fenx.com>> wrote:
Aah, 'must' is less restrictive in this context than I expected. When you combine the two halves the first part might be more accurately phrased as 'The program must make source code available' rather than 'must include' which I understood to mean 'ship with'. Emile

On Friday, February 5, 2016 11:57 AM, Emile van Sebille <emile@fenx.com> wrote:
First, step back and think of this in common sense terms: If being open source required any Python installation to have the .py source to the .pyc or .zip files in the stdlib, surely it would also require any Python installation to have the .c source to the interpreter too. But lots of people have Python without having the .c source. Also, the GPL isn't typical of all open source licenses, it's only typical of _copyleft_ licenses. Permissive licenses, like Python's, are very different. Copyleft licenses are designed to make sure that all derived works are also copylefted; permissive licenses are designed to permit derived works as widely as possible. As the Python license specifically says, "All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source." Meanwhile, the fact that someone has decided that the Python license qualifies under the Open Source Definition doesn't mean the OSD is the right way to understand it. Read the license itself, or one of the summaries at opensource.org or fsf.org. (And if you still can't figure something out, and it's important to your work, you almost certainly need to ask a lawyer.) So, if you think the first sentence of section 2 of the OSD contradicts the explanation in the rest of the paragraph--well, even if you're right, that doesn't affect Python's license at all. Finally, if you want to see what it takes to actually make all the terms unambiguous both to ordinary human beings and to legal codes, see the GPL FAQ sections on their definitions of "propagate" and "convey". It may take you lots of careful reading to understand it, but when you finally do, it's definitely unambiguous.

On Fri, Feb 5, 2016, at 10:33 AM, Emile van Sebille wrote:
Couple things. First, the OSD is not authoritative. Python's license establishes the rules of its distribution: that Python's license is considered compatible with the OSD doesn't actually mean your reading of anything on the OSD page as having any binding meaning. Second, OSD's Rule 2 means that those who are distributing Python -- the PSF, originally -- must provide source code if they're distributing it under Python's license, but it doesn't actually mean it must be packaged with it in every download. In fact, its not today. The standard library source is included in normal downloads, but the C source of Python isn't. But you can download it readily though, so that's fine. Its fully compliant with the OSD. But! If Debian (pulling them out of a hat randomly) is distributing Python, they aren't the PSF, and notably are not bound by the OSD rules, only by Python's license terms. The PSF satisfied their requirements to the licensing terms when releasing Python, but now Debian has Python, and they are distributing it-- that's an entirely separate act, and you must look at them as a separate actor in terms of the license. They don't have to distribute it in the same license. They must be ABLE to (as OSD's Rule 3 says), but they don't HAVE to. Some random person can take Python, rename it Snakey, and release it under almost any license they want and give no one the source code at all. Python has from the beginning allowed this:its actually in quite a few closed source / proprietary products without ever advertising it and providing no source, entirely legally and ethically -- Python's gone out of its way to support this sort of use-case. As it happens, Debian usually distributes something very close to the official release (sometimes they backport patches and such), and always does so under the same license as Python (AFAICT), but they don't *have* to. GPL is copyleft and requires its derivative works to be GPL'd (or at least, no more restrictive then GPL)-- so in GPL, to distribute it you MUST distribute it under GPL-compatible terms. Python is a permissive license and allows anyone to do basically anything, INCLUDING produce closed source releases if someone wanted to, or just release modifications or modules that are available under different licenses. The OSD encompasses both ends of the spectrum: the GPL's mandate of source access and the OSD's mandate of the receiver to be able to distribute in the same terms they received (notably, NOT the same terms it was originally released under). -- Stephen Hansen m e @ i x o k a i . i o

Executive summary: There is no licensing issue because Python isn't copyleft. Stick to the pragmatic *technical* issue of how to reliably provide corresponding source to those who want to look at that source (just because that's how we do things in Python). Emile van Sebille writes:
Except for that nasty licensing issue requiring source code.
CPython is not now and never has been copyleft. CPython is distributed by the PSF *as* open source with a license that *permits* redistribution of original source and derivatives (including executables), but legally need not *remain* open source downstream. The remaining issue is the PSF's CLA which permits the PSF to relicense/sublicense under any open source license. However it's not clear to me that the PSF is required by the CLA to distribute source! It receives the code under very permissive licenses, and the CLA merely names the contributor's chosen license. I imagine those licenses determine whether the PSF must distribute source. If so, no, not even the PSF is bound (legally) to distribute Python source. Of course if *you* want to you can GPL Python (I think that's now possible, at one time there was a issue with the CNRI license IIRC), and then licensees of *your* distribution (but not you!) are required to distribute source. Of course our trust in the PSF is based on the moral principle of reciprocity: we contribute to the PSF's distribution as open source (according to the CLA) in large part because we expect to receive open source back. But if the PSF ever goes so wrong as to even think of taking advantage of that loophole, we are well and truly hosed anyway. (Among other things, that means a voting majority of the current PSF Board -- many of them core developers -- fell under a bus.) So don't worry about it.

On Sat, Feb 6, 2016 at 3:31 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
And even the GPL doesn't require you to distribute the source along with every copy of the binary. As long as the source is *available*, it's acceptable to distribute just the binary for convenience. For instance, on my Debian systems, I can say "apt-get install somepackage" to get just the binary, and then "apt-get source somepackage" if I want the corresponding source. IANAL, but I suspect it would be compliant if the same way of obtaining the C source code also gets you the unfrozen stdlib. So yeah, no licensing problem. ChrisA

Chris Angelico writes:
True (and it would apply to frozen Python as long as the source includes the build scripts such as setup.py used to "freeze" Python), but it can be complex (especially for commercial distribution). However, the technical problem remains. For example, you mention Debian. While Debian keeps its source and binary packages very close to "in sync" on the server, there are several gotchas. For example, Debian does not restrict itself to packaging patches, it sometimes breaks your security when it thinks it's smarter than Bruce. So ... is the corresponding source you're interested in the patched or unpatched source? Do you know which you get when you install the source package? Do you know how to get the other? Suppose for reasons of stability you've "pinned" the binary. Is the corresponding Debian source package still easily available? Did you think of that gotcha when you installed the source package, or did you just assume they were still in sync? I'm sure somebody with the "security mindset" (eg, Bruce) can think of many more.... It's not Python's responsibility to solve these gotchas, of course. Many (eg, do you want patched vs. unpatched) are use-case-dependent anyway. However, many of them do go away (and Python has fulfilled any imaginable responsibility) if we distribute source with the binaries, or arrange that binaries are built from source at installation.

On Sat, Feb 6, 2016 at 4:31 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Right, sure. The technical problems are still there. Although I'm fairly confident that Debian's binaries would correspond to Debian's source - but honestly, if I'm looking for sources for anything other than the kernel, I probably want to get the latest from source control, rather than using the somewhat older version shipped in the repos. As to availability, though, most of the big distros (including Debian) keep their sources around for a long time. ChrisA

On Feb 06, 2016, at 04:38 PM, Chris Angelico wrote:
Not to get too deep into what other projects do, but yes in Debian, you can always get the patched source that corresponds to the binary you've installed, usually in both version controlled form and otherwise. I'd expect this to be true of most if not all of the Linux distros. A more interesting question is how you can actually verify this equivalence, and there are folks across the ecosystem working on reproducible builds. The idea is that you should be able to take the source that *claims* to correspond to that binary, and using the established build tools, locally reproduce a bit-wise exact duplicate of the binary. I've applied and submitted several patches to various upstreams that help with this effort, such as being able to pass in "locked" datetimes instead of the package always using e.g. datetime.now(). Let's not dive down the rabbit hole too far into how you can trust your build tool chain, and every other layer down to the quantum. Cheers, -Barry

On Mon, 1 Feb 2016 at 08:48 R. David Murray <rdmurray@bitdance.com> wrote:
Nope, it would probably need to be across all OSs to have consistent semantics.
This is what would need to be discussed in terms of how to handle this. For instance, we already do stuff in (I believe) site.py when we detect the build is in a checkout, so we could in that instance make sure the stdlib file directory takes precedence over any frozen code (hence why I wondered if the frozen importer on sys.meta_path should come after the sys.path importer). If we did that then we could make installing the stdlib files optional but still take precedence. It's all workable, it's just a question of if we want to. This is why I think we should get concrete benchmark numbers on Windows, Linux, and OS X to see if this is even worth considering as something we provide in our own binaries.
It at least wouldn't hurt anything.

On 2 February 2016 at 02:40, R. David Murray <rdmurray@bitdance.com> wrote:
While omitting Python source files does let us reduce base image sizes (quite significantly), the current perspective in Fedora and Project Atomic is that going bytecode-only (whether frozen or not) breaks too many things to be worthwhile. As one simple example, it means tracebacks no longer include source code lines, dramatically increasing the difficulty of debugging failures. As such, we're more likely to pursue minimisation efforts by splitting the standard library up into "stuff essential distro components use" and "the rest of the standard library that upstream defines" than by figuring out how to avoid shipping source files (I believe Debian already makes this distinction with the python-minimal vs python split). Zipping up the standard library doesn't break tracebacks though, so it's potentially worth exploring that option further. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Jan 30, 2016, 12:30 Sven R. Kunze <srkunze@mail.de> wrote:
It wouldn't be a requirement, just a nootion
I personally think that startup time is not really a big issue; even when it comes to microbenchmarks.
You might not, but just about every command-line app does. -brett

Hi, If you want to make startup time faster for a broad range of applications, please consider adding a lazy import facility in the stdlib. I recently tried to write a lazy import mechanism using import hooks (to make it portable from 2.6 to 3.5), it seems nearly impossible to do so (or, at least, for an average Python programmer like me). This would be much more useful (for actual users, not for architecture astronauts) than refactoring the importlib APIs in each feature version... Thanks in advance Antoine.

Brett Cannon <brett <at> python.org> writes:
A lazy importer was added in Python 3.5 and it was not possible without the module spec refactoring.
Wow... Thank you, I didn't know about that. Now for the next question: how am I supposed to use it? The following documentation leaves me absolutely clueless: """This class only works with loaders that define exec_module() as control over what module type is used for the module is required. For those same reasons, the loader’s create_module() method will be ignored (i.e., the loader’s method should only return None). Finally, modules which substitute the object placed into sys.modules will not work as there is no way to properly replace the module references throughout the interpreter safely; ValueError is raised if such a substitution is detected.""" (reference: https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader) I want to import lazily the modules from package "foobar.*", but not other modules as other libraries may depend on import side effects. How do I do that? The quoted snippet doesn't really help. Regards Antoine.

On Jan 31, 2016, at 12:02 PM, Brett Cannon <brett@python.org> wrote:
A lazy importer was added in Python 3.5
Is there any docs on how to actually use the LazyLoader in 3.5? I can’t seem to find any but I don’t really know the import system that well. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

There are no example docs for it yet, but enough people have asked this week about how to set up a custom importer that I will write up a generic example case which will make sense for a lazy loader (need to file the issue before I forget). On Sun, 31 Jan 2016, 09:11 Donald Stufft <donald@stufft.io> wrote:

I have opened http://bugs.python.org/issue26252 to track writing the example (and before ppl go playing with the lazy loader, be aware of http://bugs.python.org/issue26186). On Sun, 31 Jan 2016 at 09:26 Brett Cannon <brett@python.org> wrote:

It doesn't currently end up on disk. Some tables are partially or completely stored on disk as Python source code (some are partially generated from simple rules), but others are generated by inverting those. That process takes time that could be avoided by storing the generated tables, and storing all of it in a format that doesn't require parsing, compiling and executing (such as a native array). Potentially it could be a win all around if we stopped including the (larger) source files, but that doesn't seem like a good idea for maintaining portability to other implementations. The main thought is making the compiler binary bigger to avoid generating encoding tables at startup. Top-posted from my Windows Phone -----Original Message----- From: "francismb" <francismb@email.de> Sent: 1/29/2016 13:56 To: "python-dev@python.org" <python-dev@python.org> Subject: Re: [Python-Dev] More optimisation ideas Hi,
it's really an important trade off? As far a I understand from your email those modules are always being loaded and the final data created. won't the space be there (on mem or disk)? Thanks in advance! francis _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org

On 30 January 2016 at 03:48, Steve Dower <steve.dower@python.org> wrote:
When I last tried to profile startup on Windows (I haven't used Windows for some time now) it seemed that the time was totally dominated by file system access. Essentially the limiting factor was the inordinate number of stat calls and small file accesses. Although this was probably Python 2.x which may not import those particular modules and maybe it depends on virus scanner software etc. Things may have changed now but I concluded that substantive gains could only come from improving FS access. Perhaps something like zipping up the standard library would see a big improvement. -- Oscar

On 29.01.16 19:05, Steve Dower wrote:
$ ./python -m timeit -s "import codecs; from encodings.cp437 import decoding_table" -- "codecs.charmap_build(decoding_table)" 100000 loops, best of 3: 4.36 usec per loop Getting rid from charmap_build() would save you at most 4.4 microseconds per encoding. 0.0005 seconds if you have imported *all* standard encodings! And how you expected to store encoding_table in more efficient way?

On 30Jan2016 0645, Serhiy Storchaka wrote:
Just as happy to be proven wrong. Perhaps I misinterpreted my original profiling and then, embarrassingly, ran with the result for a long time without retesting.
And how you expected to store encoding_table in more efficient way?
There's nothing inefficient about its storage, but as it does not change it would be trivial to store it statically. Then "building" the map is simply obtaining a pointer into an already loaded memory page. Much faster than building it on load, but both are clearly insignificant compared to other factors. Cheers, Steve

On Sat, 30 Jan 2016 at 10:21 Serhiy Storchaka <storchaka@gmail.com> wrote:
Check the archives, but I did trying freezing the entire stdlib and it didn't really make a difference in startup, so I don't know if this still holds true anymore. At this point I think all of our knowledge of what takes the most amount of time during startup is outdated and someone should try to really profile the whole thing to see where the hotspots are (e.g., is it stat calls from imports, is it actually some specific function, is it just so many little things adding up to a big thing, etc.).

Brett tried freezing the entire stdlib at one point (as we do for parts of importlib) and reported no significant improvement. Since that rules out code compilation as well as the OS calls, it'd seem the priority is to execute less code on startup. Details of that work were posted to python-dev about twelve months ago, IIRC. Maybe a little longer. Top-posted from my Windows Phone -----Original Message----- From: "Serhiy Storchaka" <storchaka@gmail.com> Sent: 1/30/2016 10:22 To: "python-dev@python.org" <python-dev@python.org> Subject: Re: [Python-Dev] More optimisation ideas On 30.01.16 18:31, Steve Dower wrote:
AFAIK the most time is spent in system calls like stat or open. Archiving the stdlib into the ZIP file and using zipimport can decrease Python startup time (perhaps there is an open issue about this). _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org

On 30.01.2016 20:15, Steve Dower wrote:
Brett tried freezing the entire stdlib at one point (as we do for parts of importlib) and reported no significant improvement. Since that rules out code compilation as well as the OS calls, it'd seem the priority is to execute less code on startup.
Details of that work were posted to python-dev about twelve months ago, IIRC. Maybe a little longer.
Freezing the entire stdlib does improve the startup time, simply because it removes stat calls, which dominate the startup time at least on Unix. It also allows sharing the stdlib byte code in memory, since it gets stored in static C structs which the OS will happily mmap into multiple processes for you without any additional effort. Our eGenix PyRun does exactly that. Even though the original motivation is a different one, the gained improvement in startup time is a nice side effect: http://www.egenix.com/products/python/PyRun/ Aside: The encodings don't really make much difference here. The dictionaries aren't all that big, so generating them on the fly doesn't really create much overhead. The trade off in terms of maintainability/speed definitely leans toward maintainability. For the larger encoding tables we already have C implementations with appropriate data structures to make lookup speed vs. storage needs efficient. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 31 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

So freezing the stdlib helps on UNIX and not on OS X (if my old testing is still accurate). I guess the next question is what it does on Windows and if we would want to ever consider freezing the stdlib as part of the build process (and if we would want to change the order of importers on sys.meta_path so frozen modules came after file-based ones). On Sun, 31 Jan 2016, 10:43 M.-A. Lemburg <mal@egenix.com> wrote:

On Sun, Jan 31, 2016 at 08:23:00PM +0000, Brett Cannon wrote:
I find that being able to easily open stdlib .py files in a text editor to read the source is extremely valuable. I've learned much more from reading the source than from (e.g.) StackOverflow. Likewise, it's often handy to do a grep over the stdlib. When you talk about freezing the stdlib, what exactly does that mean? - will the source files still be there? - how will this affect people writing patches for bugs? -- Steve

On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
Well, Brett said it would be optional, though perhaps the above paragraph is asking about doing it in our Windows build. But the linux distros might make also use the option if it exists, so the question is very meaningful. However, you'd have to ask the distro if the source would be shipped in the linux case, and I'd guess not in most cases. I don't know about anyone else, but on my own development systems it is not that unusual for me to *edit* the stdlib files (to add debug prints) while debugging my own programs. Freeze would definitely interfere with that. I could, of course, install a separate source build on my dev system, but I thought it worth mentioning as a factor. On the other hand, if the distros go the way Nick has (I think) been advocating, and have a separate 'system python for system scripts' that is independent of the one installed for user use, having the system-only python be frozen and sourceless would actually make sense on a couple of levels. --David

On Feb 01, 2016, at 11:40 AM, R. David Murray wrote:
It's very likely the .py files would still be shipped, but perhaps in a -dev package that isn't normally installed.
I do this too, though usually in a VM or chroot and not in my live system. A very common situation for me though is pdb stepping through my own code and landing in -or passing through- stdlib.
Yep, we've talked about it in Debian-land too, but never quite gotten around to doing anything. Certainly I'd like to see some consistency among Linux distros there (i.e. discussed on linux-sig@). But even with system scripts, I do need to step through them occasionally. If it were a matter of changing a shebang or invoking the script with a different Python (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full unpacked source, that would be fine. Cheers, -Barry

" " == Barry Warsaw <barry@python.org> writes:
>> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: >> I don't know about anyone else, but on my own development >> systems it is not that unusual for me to *edit* the stdlib >> files (to add debug prints) while debugging my own programs. >> Freeze would definitely interfere with that. I could, of >> course, install a separate source build on my dev system, but I >> thought it worth mentioning as a factor. [snip] > But even with system scripts, I do need to step through them > occasionally. If it were a matter of changing a shebang or > invoking the script with a different Python > (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full > unpacked source, that would be fine. If the stdlib were to use implicit namespace packages ( https://www.python.org/dev/peps/pep-0420/ ) and the various loaders/importers as well, then python could do what I've done with an embedded python application for years. Freeze the stdlib (or put it in a zipfile or whatever is fast). Then arrange PYTHONPATH to first look on the filesystem and then look in the frozen/ziped storage. Normally the filesystem part is empty. So, modules are loaded from the frozen/zip area. But if you wanna override one of the frozen modules simply copy one or more .py files onto the file system. I've been doing this only with modules in the global scope. But implicit namespace packages seem to open the door for this with packages. Mike

On Feb 01 2016, mike.romberg@comcast.net wrote:
Presumably that would eliminate the performance advantages of the frozen/zipped storage because now Python would still have to issue all the stat calls to first check for the existence of a .py file. Best, -Nikolaus (No Cc on replies please, I'm reading the list) -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Feb 1, 2016, at 09:59, mike.romberg@comcast.net wrote:
This is a great solution for experienced developers, but I think it would be pretty bad for novices or transplants from other languages (maybe even including Python 2). There are already multiple duplicate questions every month on StackOverflow from people asking "how do I find the source to stdlib module X". The canonical answer starts off by explaining how to import the module and use its __file__, which everyone is able to handle. If we have to instead explain how to work out the .py name from the qualified module name, how to work out the stdlib path from sys.path, and then how to find the source from those two things, with the caveat that it may not be installed at all on some platforms, and how to make sure what they're asking about really is a stdlib module, and how to make sure they aren't shadowing it with a module elsewhere on sys.path, that's a lot more complicated. Especially when you consider that some people on Windows and Mac are writing Python scripts without ever learning how to use the terminal or find their Python packages via Explorer/Finder. And meanwhile, other people would be asking why their app runs slower on one machine than another, because they didn't expect that installing python-dev on top of python would slow down startup. Finally, on Linux and Mac, the stdlib will usually be somewhere that's not user-writable--and we shouldn't expect users to have to mess with stuff in /usr/lib or /System/Library even if they do have sudo access. Of course we could put a "stdlib shadow" location on the sys.path and configure it for /usr/local/lib and /Library and/or for somewhere in -, but that just makes the lookup proceed even more complicated--not to mention that we've just added three stat calls to remove one open, at which point the optimization has probably become a pessimization.

On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote:
Perhaps even easier: start IDLE, hit Alt-M, type in module name as one would import it, click OK. If Python source is available, IDLE will open in an editor window. with the path on the title bar. If we have to instead explain how to work out the .py name
The windows has the path on the title bar, so one can tell what was loaded. IDLE currently uses imp.find_module (this could be updated), with a backup of __import__(...).__file__, so it will load non-stdlib files that can be imported.
Finally, on Linux and Mac, the stdlib will usually be somewhere that's not user-writable
On Windows, this depends on the install location. Perhaps there should be an option for edit-save or view only to avoid accidental changes. -- Terry Jan Reedy

On Feb 1, 2016, at 19:44, Terry Reedy <tjreedy@udel.edu> wrote:
The point of this thread is the suggestion that the stdlib modules be frozen or stored in a zipfile, unless a user modifies things in some way to make the source accessible. So, if a user hasn't done that (which no novice will know how to do), there won't be a path to show in the title bar, so IDLE won't be any more help than the command line. (I suppose IDLE could grow a new feature to look up "associated source files" for a zipped stdlib or something, but that seems like a pretty big new feature.)
The problem is that, if the standard way for users to see stdlib sources is to copy them from somewhere else (like $install/src/Lib) into a stdlib directory (like $install/Lib), then that stdlib directory has to be writable--and on Mac and Linux, it's not.

On 2 February 2016 at 06:39, Andrew Barnert via Python-Dev <python-dev@python.org> wrote:
For folks that *do* know how to use the terminal: $ python3 -m inspect --details inspect Target: inspect Origin: /usr/lib64/python3.4/inspect.py Cached: /usr/lib64/python3.4/__pycache__/inspect.cpython-34.pyc Loader: <_frozen_importlib.SourceFileLoader object at 0x7f0d8d23d9b0> (And if they just want to *read* the source code, then leaving out "--details" prints the full module source, and would work even if the standard library were in a zip archive) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 04.02.2016 14:09, Nick Coghlan wrote:
I want to see and debug also core Python in PyCharm and this is not acceptable. If you want to make it opt-in, fine. But opt-out is a no-go. I have a side-by-side comparison as we use Java and Python in production. It's the *ease of access* that makes Python great compared to Java. @Andrew Even for experienced developers it just sucks and there are more important things to do. Best, Sven

On 2/4/2016 12:18 PM, Sven R. Kunze wrote:
This is completely inadequate as a replacement for loading source into an editor, even if just for reading. First, on Windows, the console defaults to 300 lines. Print more and only the last 300 lines remain. The max is buffer size is 9999. But setting the buffer to that is obnoxious because the buffer is then padded with blank lines to make 9999 lines. The little rectangle that one grabs in the scrollbar is then scaled down to almost nothing, becoming hard to grab. Second is navigation. No Find, Find-next, or Find-all. Because of padding, moving to the unpadded 'bottom of file' is difficult. Third, for a repository version, I would have to type, without error, instead of 'python3', some version of, for instance, some suffix of 'F:/python/dev/35/PcBuild/<I forget>/python_d.exe'. "<I forget>" depends, I believe, on the build options.
I agree that removing stdlib python source files by default is an poor idea. The disk space saved is trivial. So, for me, would be nearly all of the time saving. Over recent versions, more and more source files have been linked to in the docs. Guido recently approved of linking the rest. Removing source contradicts this trend. Easily loading modules, including stdlib modules, into an IDLE Editor Window is a documented feature that goes back to the original commit in Aug 2000. We not not usually break stdlib features without acknowledgement, some decussion, and a positive decision to do so. Someone has already mentioned the degredation of tracebacks. So why not just leave the source files alone in /Lib. As far as I can see, they would not hurt anything At least on Windows, zip files are treated as directories and python35.zip comes before /Lib on sys.path. The Windows installer currently has an option, selected by default I believe, to run compileall. Add to compileall an option to compile all to python35.zip rather than __pycache and and use that in that installer. Even if the zip is including in the installer, compileall-zip + source files would let adventurous people patch their stdlib files. Editing a stdlib file, to see if a confirmed bug disappeared (it did), was how I made my first code contribution. If I had had to download and setup svn and maybe visual c to try a one line change, I would not have done it. -- Terry Jan Reedy

On Thu, Feb 04, 2016 at 07:58:30PM -0500, Terry Reedy wrote:
I agree with Terry. The inspect trick Nick describes above is a great feature to have, but it's not a substitute for opening the source in an editor, not even on OSes where the command line tools are more powerful than Windows' default tools. [...]
I too would be very reluctant to remove the source files from Python by default, but I have an alternative. I don't know if this is a ridiculous idea or not, but now that the .pyc bytecode files are kept in a separate __pycache__ directory, could we freeze that directory and leave the source files available for reading? (I'm not even sure if this suggestion makes sense, since I'm not really sure what "freezing" the stdlib entails. Is it documented anywhere?) -- Steve

On 5 February 2016 at 15:05, Steven D'Aprano <steve@pearwood.info> wrote:
(I'm not even sure if this suggestion makes sense, since I'm not really sure what "freezing" the stdlib entails. Is it documented anywhere?)
It's not particularly well documented - most of the docs you'll find are about freeze utilities that don't explain how they work, or the FrozenImporter, which doesn't explain how to *create* a frozen module and link it into your Python executable. Your approach of thinking of a frozen module as a generated .pyc file that has been converted to a builtin module is a pretty good working model, though. (It isn't *entirely* accurate, but the discrepancies are sufficiently arcane that they aren't going to matter in any case that doesn't involve specifically poking around at the import related attributes). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2/5/2016 9:37 AM, Alexander Walters wrote:
Hmm, the annotated Open Source Definition explicitly states "The program must include source code" -- how did I misinterpret that? Emile http://opensource.org/osd-annotated

On 2/5/2016 10:38 AM, Brett Cannon wrote:
On Fri, 5 Feb 2016 at 10:34 Emile van Sebille <emile@fenx.com <mailto:emile@fenx.com>> wrote:
Aah, 'must' is less restrictive in this context than I expected. When you combine the two halves the first part might be more accurately phrased as 'The program must make source code available' rather than 'must include' which I understood to mean 'ship with'. Emile

On Friday, February 5, 2016 11:57 AM, Emile van Sebille <emile@fenx.com> wrote:
First, step back and think of this in common sense terms: If being open source required any Python installation to have the .py source to the .pyc or .zip files in the stdlib, surely it would also require any Python installation to have the .c source to the interpreter too. But lots of people have Python without having the .c source. Also, the GPL isn't typical of all open source licenses, it's only typical of _copyleft_ licenses. Permissive licenses, like Python's, are very different. Copyleft licenses are designed to make sure that all derived works are also copylefted; permissive licenses are designed to permit derived works as widely as possible. As the Python license specifically says, "All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source." Meanwhile, the fact that someone has decided that the Python license qualifies under the Open Source Definition doesn't mean the OSD is the right way to understand it. Read the license itself, or one of the summaries at opensource.org or fsf.org. (And if you still can't figure something out, and it's important to your work, you almost certainly need to ask a lawyer.) So, if you think the first sentence of section 2 of the OSD contradicts the explanation in the rest of the paragraph--well, even if you're right, that doesn't affect Python's license at all. Finally, if you want to see what it takes to actually make all the terms unambiguous both to ordinary human beings and to legal codes, see the GPL FAQ sections on their definitions of "propagate" and "convey". It may take you lots of careful reading to understand it, but when you finally do, it's definitely unambiguous.

On Fri, Feb 5, 2016, at 10:33 AM, Emile van Sebille wrote:
Couple things. First, the OSD is not authoritative. Python's license establishes the rules of its distribution: that Python's license is considered compatible with the OSD doesn't actually mean your reading of anything on the OSD page as having any binding meaning. Second, OSD's Rule 2 means that those who are distributing Python -- the PSF, originally -- must provide source code if they're distributing it under Python's license, but it doesn't actually mean it must be packaged with it in every download. In fact, its not today. The standard library source is included in normal downloads, but the C source of Python isn't. But you can download it readily though, so that's fine. Its fully compliant with the OSD. But! If Debian (pulling them out of a hat randomly) is distributing Python, they aren't the PSF, and notably are not bound by the OSD rules, only by Python's license terms. The PSF satisfied their requirements to the licensing terms when releasing Python, but now Debian has Python, and they are distributing it-- that's an entirely separate act, and you must look at them as a separate actor in terms of the license. They don't have to distribute it in the same license. They must be ABLE to (as OSD's Rule 3 says), but they don't HAVE to. Some random person can take Python, rename it Snakey, and release it under almost any license they want and give no one the source code at all. Python has from the beginning allowed this:its actually in quite a few closed source / proprietary products without ever advertising it and providing no source, entirely legally and ethically -- Python's gone out of its way to support this sort of use-case. As it happens, Debian usually distributes something very close to the official release (sometimes they backport patches and such), and always does so under the same license as Python (AFAICT), but they don't *have* to. GPL is copyleft and requires its derivative works to be GPL'd (or at least, no more restrictive then GPL)-- so in GPL, to distribute it you MUST distribute it under GPL-compatible terms. Python is a permissive license and allows anyone to do basically anything, INCLUDING produce closed source releases if someone wanted to, or just release modifications or modules that are available under different licenses. The OSD encompasses both ends of the spectrum: the GPL's mandate of source access and the OSD's mandate of the receiver to be able to distribute in the same terms they received (notably, NOT the same terms it was originally released under). -- Stephen Hansen m e @ i x o k a i . i o

Executive summary: There is no licensing issue because Python isn't copyleft. Stick to the pragmatic *technical* issue of how to reliably provide corresponding source to those who want to look at that source (just because that's how we do things in Python). Emile van Sebille writes:
Except for that nasty licensing issue requiring source code.
CPython is not now and never has been copyleft. CPython is distributed by the PSF *as* open source with a license that *permits* redistribution of original source and derivatives (including executables), but legally need not *remain* open source downstream. The remaining issue is the PSF's CLA which permits the PSF to relicense/sublicense under any open source license. However it's not clear to me that the PSF is required by the CLA to distribute source! It receives the code under very permissive licenses, and the CLA merely names the contributor's chosen license. I imagine those licenses determine whether the PSF must distribute source. If so, no, not even the PSF is bound (legally) to distribute Python source. Of course if *you* want to you can GPL Python (I think that's now possible, at one time there was a issue with the CNRI license IIRC), and then licensees of *your* distribution (but not you!) are required to distribute source. Of course our trust in the PSF is based on the moral principle of reciprocity: we contribute to the PSF's distribution as open source (according to the CLA) in large part because we expect to receive open source back. But if the PSF ever goes so wrong as to even think of taking advantage of that loophole, we are well and truly hosed anyway. (Among other things, that means a voting majority of the current PSF Board -- many of them core developers -- fell under a bus.) So don't worry about it.

On Sat, Feb 6, 2016 at 3:31 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
And even the GPL doesn't require you to distribute the source along with every copy of the binary. As long as the source is *available*, it's acceptable to distribute just the binary for convenience. For instance, on my Debian systems, I can say "apt-get install somepackage" to get just the binary, and then "apt-get source somepackage" if I want the corresponding source. IANAL, but I suspect it would be compliant if the same way of obtaining the C source code also gets you the unfrozen stdlib. So yeah, no licensing problem. ChrisA

Chris Angelico writes:
True (and it would apply to frozen Python as long as the source includes the build scripts such as setup.py used to "freeze" Python), but it can be complex (especially for commercial distribution). However, the technical problem remains. For example, you mention Debian. While Debian keeps its source and binary packages very close to "in sync" on the server, there are several gotchas. For example, Debian does not restrict itself to packaging patches, it sometimes breaks your security when it thinks it's smarter than Bruce. So ... is the corresponding source you're interested in the patched or unpatched source? Do you know which you get when you install the source package? Do you know how to get the other? Suppose for reasons of stability you've "pinned" the binary. Is the corresponding Debian source package still easily available? Did you think of that gotcha when you installed the source package, or did you just assume they were still in sync? I'm sure somebody with the "security mindset" (eg, Bruce) can think of many more.... It's not Python's responsibility to solve these gotchas, of course. Many (eg, do you want patched vs. unpatched) are use-case-dependent anyway. However, many of them do go away (and Python has fulfilled any imaginable responsibility) if we distribute source with the binaries, or arrange that binaries are built from source at installation.

On Sat, Feb 6, 2016 at 4:31 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Right, sure. The technical problems are still there. Although I'm fairly confident that Debian's binaries would correspond to Debian's source - but honestly, if I'm looking for sources for anything other than the kernel, I probably want to get the latest from source control, rather than using the somewhat older version shipped in the repos. As to availability, though, most of the big distros (including Debian) keep their sources around for a long time. ChrisA

On Feb 06, 2016, at 04:38 PM, Chris Angelico wrote:
Not to get too deep into what other projects do, but yes in Debian, you can always get the patched source that corresponds to the binary you've installed, usually in both version controlled form and otherwise. I'd expect this to be true of most if not all of the Linux distros. A more interesting question is how you can actually verify this equivalence, and there are folks across the ecosystem working on reproducible builds. The idea is that you should be able to take the source that *claims* to correspond to that binary, and using the established build tools, locally reproduce a bit-wise exact duplicate of the binary. I've applied and submitted several patches to various upstreams that help with this effort, such as being able to pass in "locked" datetimes instead of the package always using e.g. datetime.now(). Let's not dive down the rabbit hole too far into how you can trust your build tool chain, and every other layer down to the quantum. Cheers, -Barry

On Mon, 1 Feb 2016 at 08:48 R. David Murray <rdmurray@bitdance.com> wrote:
Nope, it would probably need to be across all OSs to have consistent semantics.
This is what would need to be discussed in terms of how to handle this. For instance, we already do stuff in (I believe) site.py when we detect the build is in a checkout, so we could in that instance make sure the stdlib file directory takes precedence over any frozen code (hence why I wondered if the frozen importer on sys.meta_path should come after the sys.path importer). If we did that then we could make installing the stdlib files optional but still take precedence. It's all workable, it's just a question of if we want to. This is why I think we should get concrete benchmark numbers on Windows, Linux, and OS X to see if this is even worth considering as something we provide in our own binaries.
It at least wouldn't hurt anything.

On 2 February 2016 at 02:40, R. David Murray <rdmurray@bitdance.com> wrote:
While omitting Python source files does let us reduce base image sizes (quite significantly), the current perspective in Fedora and Project Atomic is that going bytecode-only (whether frozen or not) breaks too many things to be worthwhile. As one simple example, it means tracebacks no longer include source code lines, dramatically increasing the difficulty of debugging failures. As such, we're more likely to pursue minimisation efforts by splitting the standard library up into "stuff essential distro components use" and "the rest of the standard library that upstream defines" than by figuring out how to avoid shipping source files (I believe Debian already makes this distinction with the python-minimal vs python split). Zipping up the standard library doesn't break tracebacks though, so it's potentially worth exploring that option further. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Jan 30, 2016, 12:30 Sven R. Kunze <srkunze@mail.de> wrote:
It wouldn't be a requirement, just a nootion
I personally think that startup time is not really a big issue; even when it comes to microbenchmarks.
You might not, but just about every command-line app does. -brett

Hi, If you want to make startup time faster for a broad range of applications, please consider adding a lazy import facility in the stdlib. I recently tried to write a lazy import mechanism using import hooks (to make it portable from 2.6 to 3.5), it seems nearly impossible to do so (or, at least, for an average Python programmer like me). This would be much more useful (for actual users, not for architecture astronauts) than refactoring the importlib APIs in each feature version... Thanks in advance Antoine.

Brett Cannon <brett <at> python.org> writes:
A lazy importer was added in Python 3.5 and it was not possible without the module spec refactoring.
Wow... Thank you, I didn't know about that. Now for the next question: how am I supposed to use it? The following documentation leaves me absolutely clueless: """This class only works with loaders that define exec_module() as control over what module type is used for the module is required. For those same reasons, the loader’s create_module() method will be ignored (i.e., the loader’s method should only return None). Finally, modules which substitute the object placed into sys.modules will not work as there is no way to properly replace the module references throughout the interpreter safely; ValueError is raised if such a substitution is detected.""" (reference: https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader) I want to import lazily the modules from package "foobar.*", but not other modules as other libraries may depend on import side effects. How do I do that? The quoted snippet doesn't really help. Regards Antoine.

On Jan 31, 2016, at 12:02 PM, Brett Cannon <brett@python.org> wrote:
A lazy importer was added in Python 3.5
Is there any docs on how to actually use the LazyLoader in 3.5? I can’t seem to find any but I don’t really know the import system that well. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

There are no example docs for it yet, but enough people have asked this week about how to set up a custom importer that I will write up a generic example case which will make sense for a lazy loader (need to file the issue before I forget). On Sun, 31 Jan 2016, 09:11 Donald Stufft <donald@stufft.io> wrote:

I have opened http://bugs.python.org/issue26252 to track writing the example (and before ppl go playing with the lazy loader, be aware of http://bugs.python.org/issue26186). On Sun, 31 Jan 2016 at 09:26 Brett Cannon <brett@python.org> wrote:
participants (23)
-
Alexander Walters
-
Andrew Barnert
-
Antoine Pitrou
-
Barry Warsaw
-
Brett Cannon
-
Chris Angelico
-
Donald Stufft
-
Emile van Sebille
-
Ethan Furman
-
francismb
-
M.-A. Lemburg
-
mike.romberg@comcast.net
-
Nick Coghlan
-
Nikolaus Rath
-
Oscar Benjamin
-
R. David Murray
-
Serhiy Storchaka
-
Stephen Hansen
-
Stephen J. Turnbull
-
Steve Dower
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy