size of the installation of Python on mobile devices

Issue 26852 [1] proposes an enhancement to reduce the size of the Python installation by using a sourceless distribution. It seems that discussion on this problem (the size of the Python installation on mobile devices) should be moved here. IMHO there is no need to debate on the fact that a full install of Python on a mobile device today is a problem. As a reference, here is a raw estimation of the sizes of the different installs: full install: 111M without the test suite: 53M without the test suite and as a sourceless distribution: 23M same as above but also without ensurepip: 14M It would be useful to find the references to the previous discussions related to sourceless distributions, especially the reasons why their use is discouraged. Xavier [1] http://bugs.python.org/issue26852

2016-07-28 18:30 GMT+02:00 Xavier de Gaye <xdegaye@gmail.com>:
FYI size does also matter on server: it became an important for cloud and container images. See for example the "System Python" idea of Fedora which comes with a truncated standard library: https://fedoraproject.org/wiki/Changes/System_Python Extract: """ - system-python-libs - a subset of standard Python library considered essential to run "system tools" requiring Python - system-python - /usr/libexec/system-python binary (interpreter) for the tools to be able to run, provides system-python(abi) - python3-libs brings in the remaining subset of standard Python library (...) (...) """
I worked on embeded devices, and truncating the stanard library is a common and *mandatory* practice. Victor

On 28.07.2016 19:00, Victor Stinner wrote:
You may want to have a look at what we did for eGenix PyRun: http://www.egenix.com/products/python/PyRun/ for how to create a minified version of the Python stdlib.
Sourceless distributions are not discouraged. They are essential for commercial use cases and I would strongly oppose removing the possibility to ship byte code only applications. Why do you think that this doesn't work anymore ?
I worked on embeded devices, and truncating the stanard library is a common and *mandatory* practice.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jul 28 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 07/28/2016 08:07 PM, M.-A. Lemburg wrote:
This refers to this message [1] from the discussion in issue 26852. I would not think that sourceless distributions are discouraged if I am submitting a patch that adds an option to build a sourceless distribution, there are better ways to waste time :) The drawbacks of sourceless distributions: * Quote from PEP 3147 "In practice, it is well known that pyc files are not compatible across Python major releases. A reading of import.c [3] in the Python source code proves that within recent memory, every new CPython major release has bumped the pyc magic number.". * The test suite cannot be run on a sourceless distribution as many tests use the linecache module. It would be useful to know the other reasons why a sourceless distribution is discouraged, so we might fix them. And if they cannot be fixed then issue 26852 can be closed. [1] http://bugs.python.org/issue26852#msg271535

On 29 July 2016 at 19:51, Xavier de Gaye <xdegaye@gmail.com> wrote:
"It currently makes debugging a pain" is the reason why we (Fedora & Red Hat) chose not to go for a sourceless distribution of the standard library as a way of reducing the default image size for the base Fedora Cloud and Atomic images. However, that's technically a solvable problem - given a traceback without source contents, we (python-dev) could provide a way to regenerate it against a full distribution with source code (similar to the way folks are able to map stack dumps in other compiled languages back to the relevant pieces of source code). It's also only an argument for folks to take into account when they consider "sourceless or not?" for their particular use case, rather than an argument against making sourceless distribution for size optimisation purposes easier in general. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2016-07-29 13:13 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
In the embeded world, you have hard limit. For example, the firmware (boot load, kernel, OS, application: everything) must be exceed 128 MB. It takes time to reduce the firmware size by 1 MB. So such obvious solution of not embeding .py files but only .pyc is not really an option, but just mandatory. On the production, usually you just cannot run a debugger just for practical reasons. For example, you cannot open a TCP connection to the device. Only the device can open connections to a known server. Traceback are not logged neither to not waste limited storage. So .py files are just useless. Victor

On 29.07.2016 11:51, Xavier de Gaye wrote:
As I read it, David was referring to sourceless distribution of the Python stdlib itself. That's a different use case than the one I meant, which focuses on shipping application code without source code.
This argument is only valid if you are distributing application code without also shipping a Python interpreter to go with it.
* The test suite cannot be run on a sourceless distribution as many tests use the linecache module.
That's true, but in production, you are typically not interested in the test suite anyway. The same is true for the ensurepip package, since this comes with wheel files living inside the package directory structure. BTW: For distutils compilation, you also need the include files, but again, depending on your use case, you may not be interested in supporting this.
There are certainly a lot of use cases for wanting to have a Python distribution which is small, but in most cases, you need to do more than just remove the .py files to address these, so I'm not sure whether we need an option in configure for this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jul 29 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 07/29/2016 01:44 PM, M.-A. Lemburg wrote:
No, a sourceless distribution does not "just remove the .py files", it also does not install the __pycache__ files. The gain in size is quite significant, more than 50%. Right, there are more steps to a small Python than a configure option to build a sourceless distribution, but obviously this first step is the one that would provide the biggest gain in size. Xavier

Xavier de Gaye writes:
It would be useful to know the other reasons why a sourceless distribution is discouraged,
I wouldn't say it's discouraged, except for the ethical reason that Python is open source, and we love the idea that our users can hack Python for their own needs. That, of course, is balanced by the fact that sourceless distribution is exactly such a derivative. But I don't see why maintaining a sourceless distribution is an appropriate use of core developer time. On the one hand, it seems to be something that can be done separately from the core, and there are already many examples. On the other hand, there doesn't seem to be demand for a "generic" sourceless distribution -- removing source is the *easy* part. In fact there are many use cases that *need* variants, so it seems to me that it would be likely to be a real hairball of optional features. For example, as Terry mentioned not all applications need the unicodedata module, and it's *big*, but many applications I can imagine doing myself (spam-checking and other validation of textual data) would include it. Steve

On 30 July 2016 at 17:25, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
While you're not wrong about the complexity of the problem at hand, there are actually a few different problem domains pushing towards offering better mainline support for stripped down special purpose builds of the CPython runtime: - mobile devices - embedded devices - self-contained binaries - Linux container images (& even unikernels) These are all cases where "How do we make the core Python installation smaller?" is a reasonable question to ask (to either reduce storage requirements or to reduce network usage), and "use MicroPython instead of CPython" isn't always going to be the right answer (e.g. if you need CPython C API support). These are also all cases where the ability to automate builds and updates encourages pushing more complexity into the build process in pursuit of simplification and minimisation of the runtime components. If we say "pursue this problem outside the core", then odds are pretty good that those communities will all settle on different approaches, and we'll end up with four (or more) redundant sets of tooling and techniques for installation minimisation. By contrast, if we say "pursue improved solutions to this problem as part of CPython, but you'll need to keep these other potential use cases in mind", we increase the changes for "one obvious way" to be formulated, even if people have different motives for caring about the problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 07/30/2016 09:25 AM, Stephen J. Turnbull wrote:
How *big* exactly ? The size of the unicodedata shared library is 843K on archlinux when a sourceless distribution saves many tens of Mega bytes and a distribution without the test suite saves about 60M. IMHO only three options should be implemented, if the Python community really thinks seriously about porting Python to mobile devices: * --enable-sourceless-distribution * --disable-test-suite * --disable-distutils-builds The last option is to get rid of the config (LIBPL) directory that is used to build extension modules, as one cannot expect most mobile devices to have a proper build environment anyway. The result would be a distribution in the 10M order of magnitude, ten times smaller than the current size. Then we can look into compression, the impacts of compression on the size of the memory used by the python process and the impacts of compression on performance in mobile devices. The remaining savings (unicodedata, turtledemo, tkinter, unittest, ...) should be left to packagers.

On 30.07.2016 14:18, Xavier de Gaye wrote:
IMO, this should all be left to packagers. They will have to patch the code base anyway to get things working in such an environment, so there isn't much point in adding some bits to the configure script which everyone uses, while leaving out other important parts. This doesn't simplify the life of packagers much, but does add maintenance overhead for the core developers (since these new options would have to be tested as well before release). BTW: By not having these options in configure, you also gain a lot more freedom as packager, since you can have the build behave you would like it to instead of going through the whole change request discussion process that updates/changes to these options would entail going forward. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jul 30 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 30 July 2016 at 22:29, M.-A. Lemburg <mal@egenix.com> wrote:
Encouraging that divergence isn't free, though - it just pushes the testing burden outward by creating a larger compatibility testing matrix for folks wanting to support sourceless environments. And, to be clear, I'm not suggesting we start making official sourceless releases of CPython, nor am I suggesting we make "sourceless builds are working" a gating criterion for versioned releases - instead, I'm suggesting we allow interested folks like Xavier to add the basic knobs and dials to the standard distribution, with the awareness that they might break sometimes and requiring patching after other folks make changes. Since the assumption is that folks interested in this will be building from source anyway, it shouldn't be a problem to run directly off a maintenance branch when they need to, rather than using a particular release tag. Cheers, NIck. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 07/30/2016 04:42 PM, Nick Coghlan wrote:
Then I propose that we vote to allow issue 26852 [1] to add the '--enable-sourceless-distribution' option to configure. I am +1 for the reasons given in my previous posts. Xavier [1] http://bugs.python.org/issue26852

Xavier is a core developer. He is free to dedicate his time to supporting sourceless distribution :-) Victor

Victor Stinner writes:
Xavier is a core developer. He is free to dedicate his time to supporting sourceless distribution :-)
So are we all, core or not. But on Nick's terms (he even envisions releases with the "sourceless" build broken), I don't think adding to core is fair to Xavier's (and others') efforts in this direction. It would also set an unfortunate precedent. Steve

On 1 August 2016 at 15:46, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
How would it be any different from our efforts to support other platforms outside the primary set of Windows, Mac OS X, Linux, and *BSD? Things *definitely* break from time-to-time on those other less common setups, and when they do, folks submit patches to fix them after they notice (usually because they went to rebase their own work on a newer version and discovered things didn't work as expected). I'd see this working the same way - we wouldn't go out of our way to break sourceless builds, but if they did break, it would be on the folks that care about them to submit patches to resolve the problem. The gain for folks that care would be getting a green light to pursue more robust support for that model in the standard library itself, such as clearly marking test cases that require linecache to be working (as we already do for tests that require docstrings or that test CPython implementation details rather than Python language features), and perhaps even eventually developing a mechanism along the lines of JavaScript sourcemaps that would allow linecache to keep working, even when running on a sourceless build of the standard library.
It would also set an unfortunate precedent.
What precedent do you mean? That ./configure may contain options that aren't 100% reliable? That's already the case - I can assure you that we *do not* consistently test all of the options reported by "./configure --help", since what matters is that the options people are *actually using* keep working in the context where they're using them, rather than all of the options working in every possible environment. Or do you mean the precedent that we're OK with folks shipping the standard library sans source code? *That* precedent was set when Guido chose to use of permissive licensing model for the language definition and runtime - while shipping without source code is a bad idea in educational contexts, and other situations where having the code on hand for inspection by end users is beneficial, Python is used in plenty of scenarios where those considerations don't apply. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
This isn't platform support in the same sense, for one. This is a variant distribution mode. A better analogy would be the common packaging support we provide for RPMs, debs, and all the whatnots associated with different Linux and *BSD distros. AFAIK there is no such support, rather we leave it up to the individual distros, which mostly ignore our own packaging technologies (except where implied by directory structure). (OK, there's probably stuff in Tools that can be applied to that, but all I see that's obvious is a *distribution- specific* facility, Misc/RPM, which is regularly used.) Second, such platforms typical provide or lack features that *require* changes to the compilation of the interpreter or stdlib modules. By contrast, Xavier's core list can be provided by script AFAICS.
I understand the functional similarity. I don't think the incentives are the same. If Python crashes on your platform, you *must* fix Python, and you've fixed it for everybody using the same platform. In sourceless, if the build process leaves pyc turds or lacks something you actually need, you may change your specialized build scripts that do all the other stuff you need, and maybe even stop using Python's sourceless support (eg, in the missing requirements case). It's not yet clear that the same fixes will be appropriate for all sourceless distros.
How does that green light depends on patches to the Makefile?
Why not wait until such features are in development, and at the same time declare sourceless a supported distribution mode?
It appears that you're arguing that this feature won't be used so it doesn't matter if it works.<wink multiplier=1.0 /> Nothing is 100% reliable, that's a red herring. The precedent I refer to is adding code of uncertain value and completeness, that will require maintenance by somebody, on speculation that it will be used. I suggest that this patch is premature optimization. It's more efficient to not add __pycache__ directories in the first place, but it's not necessary to do it that way, it can be done by a separate script. I believe that is true of all of the three options Xavier proposed -- a bash script would probably be fewer than 10 pipelines. If these were done by a script that could be jointly designed and maintained in a separate project by the sourceless distributors, it would be a lot more effective in discovering the common features that all of them use in the same way, and even more so for important features that only a subset use. Once there are features that *must* be done in the core build process (eg, linecache sourcemap), then it would be timely to move the common features there. IOW, I don't see why the usual arguments for "put it on PyPI first" don't apply here, with the usual strength.
Or do you mean the precedent that we're OK with folks shipping the standard library sans source code?
Of course *I* don't mean that. If an economist fails to understand the implications of licensing, he should change careers.

On 08/02/2016 11:09 AM, Stephen J. Turnbull wrote:
You are mistaken. There is already at least one feature of the Python build system that fits exactly your analogy, it is called MULTIARCH. It applies to all unix systems whether they support or not multiarch (AFAIK archlinux, OS X, Android do not support multiarch). It is not a minor change in Python build system, adding about 140 lines in configure.ac and the add_multiarch_paths() function in setup.py. Xavier

On 2 August 2016 at 19:09, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
We have plenty of knobs & dials in the configure script primarily for use by *nix redistributors in their package build scripts (e.g. the various "--with-system-*" flags, as well as the Debian-specific MULTIARCH support that Xavier mentioned). The fact the existence of these *doesn't* readily spring to mind for folks not actively involved in Linux distro maintenance is a point in favour of why I think what Xavier is suggesting is reasonable: we have long experience with configure flags where the only folks that typically care those flags exist are the folks that need them, and when someone else inadvertently breaks those flags, the folks that care make the necessary updates needed to restore the functionality. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
I don't, and never did, think it's unreasonable -- sourceless makes a lot of technical sense to me in several environments, and I don't see any "ethical" reason to discourage sourceless in those environments either. Based on the evidence presented so far, I think adding to core is premature and the feature would benefit from cooperation among interested parties in a development environment that can move far faster than core will. But I concede I have no experience with this kind of option in Python to match yours or Xavier's.

On 2 August 2016 at 22:37, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Ah, that's likely the key difference in our assumptions then: CPython maintenance branches already move plenty fast enough for any use case where you aren't waiting for us to make official releases :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le 1 août 2016 07:46, "Stephen J. Turnbull" < turnbull.stephen.fw@u.tsukuba.ac.jp> a écrit :
Sorry, I don't have the bandwith to follow this discussion closely. I consider that mobile platforms (smartphones, tablets, and all these new funny and shiny embedded devices) are important and Python would loose a huge "market" if we decide that such platforms are not good enough for our language... I consider that it's important but I'm not interested enough to spend much time on it. For example, I don't know well configure and Makefile, whereas many changes are needed in these files to enhance cross compilation and android support. Sorry but I don't think that discussing if mobile platforms are important is worth it. Check numbers. More mobile are sold than computers. Compare markets of smartphones versus desktop computers and watch the fall of the desktop computer market... Can we now focus on fixing concrete technical issues? Xavier already explained that "sourceless" distribution are already used in the wild, it's not something new. It will not open a gate to the hell (of closed source softwares). People who want to protect their IP (hide their code) already patch and compile CPython... Victor

On 7/28/16, 9:30 AM, "Python-ideas on behalf of Xavier de Gaye" <python-ideas-bounces+kevin-lists=theolliviers.com@python.org on behalf of xdegaye@gmail.com> wrote:
Is the reason you're doing this as a configure flag rather than some tool that can run on a full Python distro because this is a cross-compile target? I ask because as Victor points out, this concern is really not unique to Android or mobile development. There are numerous use cases for wanting a sourceless distribution on just about any mobile, desktop or server platform. The abundance of tools out there that try to solve this problem is a testament to how common the need is. If Python could spit out a sourceless version of itself, complete with site-packages, that alone would go a long way towards consolidating the packaging logic into one place, and would likely solve the mobile and embedded problem as well.
I suspect the concerns against it are the complexity of implementation, but after having done this for many years, I rather want something that is less 'smart' than the existing packaging tools I've used try to be. What I'd love personally is the ability to simply take a full Python install - system distro, venv or cross-compile target - and just package the whole thing up minus sources. I can be careful to make sure I have only what I need in it before building the release. At most I'd like to be able to specify a whitelist / blacklist file to include / exclude specific things. Thanks, Kevin

Serhiy Storchaka schrieb am 29.07.2016 um 14:50:
Speaking of compression, I just saw that the ticket about compressing .pyc files is still open, so I guess that nothing has been done in that regard. It still surprises me that they aren't. They are classical write-once, read-many kind of files, so not compressing them seems counter-intuitive. https://bugs.python.org/issue22789 Obviously, compressing everything into an archive instead of compressing each file separately is bound to provide even larger gains, but that doesn't deal with the fact that some things usually get compiled and installed later than others. Stefan

2016-07-28 18:30 GMT+02:00 Xavier de Gaye <xdegaye@gmail.com>:
FYI size does also matter on server: it became an important for cloud and container images. See for example the "System Python" idea of Fedora which comes with a truncated standard library: https://fedoraproject.org/wiki/Changes/System_Python Extract: """ - system-python-libs - a subset of standard Python library considered essential to run "system tools" requiring Python - system-python - /usr/libexec/system-python binary (interpreter) for the tools to be able to run, provides system-python(abi) - python3-libs brings in the remaining subset of standard Python library (...) (...) """
I worked on embeded devices, and truncating the stanard library is a common and *mandatory* practice. Victor

On 28.07.2016 19:00, Victor Stinner wrote:
You may want to have a look at what we did for eGenix PyRun: http://www.egenix.com/products/python/PyRun/ for how to create a minified version of the Python stdlib.
Sourceless distributions are not discouraged. They are essential for commercial use cases and I would strongly oppose removing the possibility to ship byte code only applications. Why do you think that this doesn't work anymore ?
I worked on embeded devices, and truncating the stanard library is a common and *mandatory* practice.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jul 28 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 07/28/2016 08:07 PM, M.-A. Lemburg wrote:
This refers to this message [1] from the discussion in issue 26852. I would not think that sourceless distributions are discouraged if I am submitting a patch that adds an option to build a sourceless distribution, there are better ways to waste time :) The drawbacks of sourceless distributions: * Quote from PEP 3147 "In practice, it is well known that pyc files are not compatible across Python major releases. A reading of import.c [3] in the Python source code proves that within recent memory, every new CPython major release has bumped the pyc magic number.". * The test suite cannot be run on a sourceless distribution as many tests use the linecache module. It would be useful to know the other reasons why a sourceless distribution is discouraged, so we might fix them. And if they cannot be fixed then issue 26852 can be closed. [1] http://bugs.python.org/issue26852#msg271535

On 29 July 2016 at 19:51, Xavier de Gaye <xdegaye@gmail.com> wrote:
"It currently makes debugging a pain" is the reason why we (Fedora & Red Hat) chose not to go for a sourceless distribution of the standard library as a way of reducing the default image size for the base Fedora Cloud and Atomic images. However, that's technically a solvable problem - given a traceback without source contents, we (python-dev) could provide a way to regenerate it against a full distribution with source code (similar to the way folks are able to map stack dumps in other compiled languages back to the relevant pieces of source code). It's also only an argument for folks to take into account when they consider "sourceless or not?" for their particular use case, rather than an argument against making sourceless distribution for size optimisation purposes easier in general. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2016-07-29 13:13 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
In the embeded world, you have hard limit. For example, the firmware (boot load, kernel, OS, application: everything) must be exceed 128 MB. It takes time to reduce the firmware size by 1 MB. So such obvious solution of not embeding .py files but only .pyc is not really an option, but just mandatory. On the production, usually you just cannot run a debugger just for practical reasons. For example, you cannot open a TCP connection to the device. Only the device can open connections to a known server. Traceback are not logged neither to not waste limited storage. So .py files are just useless. Victor

On 29.07.2016 11:51, Xavier de Gaye wrote:
As I read it, David was referring to sourceless distribution of the Python stdlib itself. That's a different use case than the one I meant, which focuses on shipping application code without source code.
This argument is only valid if you are distributing application code without also shipping a Python interpreter to go with it.
* The test suite cannot be run on a sourceless distribution as many tests use the linecache module.
That's true, but in production, you are typically not interested in the test suite anyway. The same is true for the ensurepip package, since this comes with wheel files living inside the package directory structure. BTW: For distutils compilation, you also need the include files, but again, depending on your use case, you may not be interested in supporting this.
There are certainly a lot of use cases for wanting to have a Python distribution which is small, but in most cases, you need to do more than just remove the .py files to address these, so I'm not sure whether we need an option in configure for this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jul 29 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 07/29/2016 01:44 PM, M.-A. Lemburg wrote:
No, a sourceless distribution does not "just remove the .py files", it also does not install the __pycache__ files. The gain in size is quite significant, more than 50%. Right, there are more steps to a small Python than a configure option to build a sourceless distribution, but obviously this first step is the one that would provide the biggest gain in size. Xavier

Xavier de Gaye writes:
It would be useful to know the other reasons why a sourceless distribution is discouraged,
I wouldn't say it's discouraged, except for the ethical reason that Python is open source, and we love the idea that our users can hack Python for their own needs. That, of course, is balanced by the fact that sourceless distribution is exactly such a derivative. But I don't see why maintaining a sourceless distribution is an appropriate use of core developer time. On the one hand, it seems to be something that can be done separately from the core, and there are already many examples. On the other hand, there doesn't seem to be demand for a "generic" sourceless distribution -- removing source is the *easy* part. In fact there are many use cases that *need* variants, so it seems to me that it would be likely to be a real hairball of optional features. For example, as Terry mentioned not all applications need the unicodedata module, and it's *big*, but many applications I can imagine doing myself (spam-checking and other validation of textual data) would include it. Steve

On 30 July 2016 at 17:25, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
While you're not wrong about the complexity of the problem at hand, there are actually a few different problem domains pushing towards offering better mainline support for stripped down special purpose builds of the CPython runtime: - mobile devices - embedded devices - self-contained binaries - Linux container images (& even unikernels) These are all cases where "How do we make the core Python installation smaller?" is a reasonable question to ask (to either reduce storage requirements or to reduce network usage), and "use MicroPython instead of CPython" isn't always going to be the right answer (e.g. if you need CPython C API support). These are also all cases where the ability to automate builds and updates encourages pushing more complexity into the build process in pursuit of simplification and minimisation of the runtime components. If we say "pursue this problem outside the core", then odds are pretty good that those communities will all settle on different approaches, and we'll end up with four (or more) redundant sets of tooling and techniques for installation minimisation. By contrast, if we say "pursue improved solutions to this problem as part of CPython, but you'll need to keep these other potential use cases in mind", we increase the changes for "one obvious way" to be formulated, even if people have different motives for caring about the problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 07/30/2016 09:25 AM, Stephen J. Turnbull wrote:
How *big* exactly ? The size of the unicodedata shared library is 843K on archlinux when a sourceless distribution saves many tens of Mega bytes and a distribution without the test suite saves about 60M. IMHO only three options should be implemented, if the Python community really thinks seriously about porting Python to mobile devices: * --enable-sourceless-distribution * --disable-test-suite * --disable-distutils-builds The last option is to get rid of the config (LIBPL) directory that is used to build extension modules, as one cannot expect most mobile devices to have a proper build environment anyway. The result would be a distribution in the 10M order of magnitude, ten times smaller than the current size. Then we can look into compression, the impacts of compression on the size of the memory used by the python process and the impacts of compression on performance in mobile devices. The remaining savings (unicodedata, turtledemo, tkinter, unittest, ...) should be left to packagers.

On 30.07.2016 14:18, Xavier de Gaye wrote:
IMO, this should all be left to packagers. They will have to patch the code base anyway to get things working in such an environment, so there isn't much point in adding some bits to the configure script which everyone uses, while leaving out other important parts. This doesn't simplify the life of packagers much, but does add maintenance overhead for the core developers (since these new options would have to be tested as well before release). BTW: By not having these options in configure, you also gain a lot more freedom as packager, since you can have the build behave you would like it to instead of going through the whole change request discussion process that updates/changes to these options would entail going forward. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jul 30 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On 30 July 2016 at 22:29, M.-A. Lemburg <mal@egenix.com> wrote:
Encouraging that divergence isn't free, though - it just pushes the testing burden outward by creating a larger compatibility testing matrix for folks wanting to support sourceless environments. And, to be clear, I'm not suggesting we start making official sourceless releases of CPython, nor am I suggesting we make "sourceless builds are working" a gating criterion for versioned releases - instead, I'm suggesting we allow interested folks like Xavier to add the basic knobs and dials to the standard distribution, with the awareness that they might break sometimes and requiring patching after other folks make changes. Since the assumption is that folks interested in this will be building from source anyway, it shouldn't be a problem to run directly off a maintenance branch when they need to, rather than using a particular release tag. Cheers, NIck. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 07/30/2016 04:42 PM, Nick Coghlan wrote:
Then I propose that we vote to allow issue 26852 [1] to add the '--enable-sourceless-distribution' option to configure. I am +1 for the reasons given in my previous posts. Xavier [1] http://bugs.python.org/issue26852

Xavier is a core developer. He is free to dedicate his time to supporting sourceless distribution :-) Victor

Victor Stinner writes:
Xavier is a core developer. He is free to dedicate his time to supporting sourceless distribution :-)
So are we all, core or not. But on Nick's terms (he even envisions releases with the "sourceless" build broken), I don't think adding to core is fair to Xavier's (and others') efforts in this direction. It would also set an unfortunate precedent. Steve

On 1 August 2016 at 15:46, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
How would it be any different from our efforts to support other platforms outside the primary set of Windows, Mac OS X, Linux, and *BSD? Things *definitely* break from time-to-time on those other less common setups, and when they do, folks submit patches to fix them after they notice (usually because they went to rebase their own work on a newer version and discovered things didn't work as expected). I'd see this working the same way - we wouldn't go out of our way to break sourceless builds, but if they did break, it would be on the folks that care about them to submit patches to resolve the problem. The gain for folks that care would be getting a green light to pursue more robust support for that model in the standard library itself, such as clearly marking test cases that require linecache to be working (as we already do for tests that require docstrings or that test CPython implementation details rather than Python language features), and perhaps even eventually developing a mechanism along the lines of JavaScript sourcemaps that would allow linecache to keep working, even when running on a sourceless build of the standard library.
It would also set an unfortunate precedent.
What precedent do you mean? That ./configure may contain options that aren't 100% reliable? That's already the case - I can assure you that we *do not* consistently test all of the options reported by "./configure --help", since what matters is that the options people are *actually using* keep working in the context where they're using them, rather than all of the options working in every possible environment. Or do you mean the precedent that we're OK with folks shipping the standard library sans source code? *That* precedent was set when Guido chose to use of permissive licensing model for the language definition and runtime - while shipping without source code is a bad idea in educational contexts, and other situations where having the code on hand for inspection by end users is beneficial, Python is used in plenty of scenarios where those considerations don't apply. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
This isn't platform support in the same sense, for one. This is a variant distribution mode. A better analogy would be the common packaging support we provide for RPMs, debs, and all the whatnots associated with different Linux and *BSD distros. AFAIK there is no such support, rather we leave it up to the individual distros, which mostly ignore our own packaging technologies (except where implied by directory structure). (OK, there's probably stuff in Tools that can be applied to that, but all I see that's obvious is a *distribution- specific* facility, Misc/RPM, which is regularly used.) Second, such platforms typical provide or lack features that *require* changes to the compilation of the interpreter or stdlib modules. By contrast, Xavier's core list can be provided by script AFAICS.
I understand the functional similarity. I don't think the incentives are the same. If Python crashes on your platform, you *must* fix Python, and you've fixed it for everybody using the same platform. In sourceless, if the build process leaves pyc turds or lacks something you actually need, you may change your specialized build scripts that do all the other stuff you need, and maybe even stop using Python's sourceless support (eg, in the missing requirements case). It's not yet clear that the same fixes will be appropriate for all sourceless distros.
How does that green light depends on patches to the Makefile?
Why not wait until such features are in development, and at the same time declare sourceless a supported distribution mode?
It appears that you're arguing that this feature won't be used so it doesn't matter if it works.<wink multiplier=1.0 /> Nothing is 100% reliable, that's a red herring. The precedent I refer to is adding code of uncertain value and completeness, that will require maintenance by somebody, on speculation that it will be used. I suggest that this patch is premature optimization. It's more efficient to not add __pycache__ directories in the first place, but it's not necessary to do it that way, it can be done by a separate script. I believe that is true of all of the three options Xavier proposed -- a bash script would probably be fewer than 10 pipelines. If these were done by a script that could be jointly designed and maintained in a separate project by the sourceless distributors, it would be a lot more effective in discovering the common features that all of them use in the same way, and even more so for important features that only a subset use. Once there are features that *must* be done in the core build process (eg, linecache sourcemap), then it would be timely to move the common features there. IOW, I don't see why the usual arguments for "put it on PyPI first" don't apply here, with the usual strength.
Or do you mean the precedent that we're OK with folks shipping the standard library sans source code?
Of course *I* don't mean that. If an economist fails to understand the implications of licensing, he should change careers.

On 08/02/2016 11:09 AM, Stephen J. Turnbull wrote:
You are mistaken. There is already at least one feature of the Python build system that fits exactly your analogy, it is called MULTIARCH. It applies to all unix systems whether they support or not multiarch (AFAIK archlinux, OS X, Android do not support multiarch). It is not a minor change in Python build system, adding about 140 lines in configure.ac and the add_multiarch_paths() function in setup.py. Xavier

On 2 August 2016 at 19:09, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
We have plenty of knobs & dials in the configure script primarily for use by *nix redistributors in their package build scripts (e.g. the various "--with-system-*" flags, as well as the Debian-specific MULTIARCH support that Xavier mentioned). The fact the existence of these *doesn't* readily spring to mind for folks not actively involved in Linux distro maintenance is a point in favour of why I think what Xavier is suggesting is reasonable: we have long experience with configure flags where the only folks that typically care those flags exist are the folks that need them, and when someone else inadvertently breaks those flags, the folks that care make the necessary updates needed to restore the functionality. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
I don't, and never did, think it's unreasonable -- sourceless makes a lot of technical sense to me in several environments, and I don't see any "ethical" reason to discourage sourceless in those environments either. Based on the evidence presented so far, I think adding to core is premature and the feature would benefit from cooperation among interested parties in a development environment that can move far faster than core will. But I concede I have no experience with this kind of option in Python to match yours or Xavier's.

On 2 August 2016 at 22:37, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Ah, that's likely the key difference in our assumptions then: CPython maintenance branches already move plenty fast enough for any use case where you aren't waiting for us to make official releases :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le 1 août 2016 07:46, "Stephen J. Turnbull" < turnbull.stephen.fw@u.tsukuba.ac.jp> a écrit :
Sorry, I don't have the bandwith to follow this discussion closely. I consider that mobile platforms (smartphones, tablets, and all these new funny and shiny embedded devices) are important and Python would loose a huge "market" if we decide that such platforms are not good enough for our language... I consider that it's important but I'm not interested enough to spend much time on it. For example, I don't know well configure and Makefile, whereas many changes are needed in these files to enhance cross compilation and android support. Sorry but I don't think that discussing if mobile platforms are important is worth it. Check numbers. More mobile are sold than computers. Compare markets of smartphones versus desktop computers and watch the fall of the desktop computer market... Can we now focus on fixing concrete technical issues? Xavier already explained that "sourceless" distribution are already used in the wild, it's not something new. It will not open a gate to the hell (of closed source softwares). People who want to protect their IP (hide their code) already patch and compile CPython... Victor

On 7/28/16, 9:30 AM, "Python-ideas on behalf of Xavier de Gaye" <python-ideas-bounces+kevin-lists=theolliviers.com@python.org on behalf of xdegaye@gmail.com> wrote:
Is the reason you're doing this as a configure flag rather than some tool that can run on a full Python distro because this is a cross-compile target? I ask because as Victor points out, this concern is really not unique to Android or mobile development. There are numerous use cases for wanting a sourceless distribution on just about any mobile, desktop or server platform. The abundance of tools out there that try to solve this problem is a testament to how common the need is. If Python could spit out a sourceless version of itself, complete with site-packages, that alone would go a long way towards consolidating the packaging logic into one place, and would likely solve the mobile and embedded problem as well.
I suspect the concerns against it are the complexity of implementation, but after having done this for many years, I rather want something that is less 'smart' than the existing packaging tools I've used try to be. What I'd love personally is the ability to simply take a full Python install - system distro, venv or cross-compile target - and just package the whole thing up minus sources. I can be careful to make sure I have only what I need in it before building the release. At most I'd like to be able to specify a whitelist / blacklist file to include / exclude specific things. Thanks, Kevin

Serhiy Storchaka schrieb am 29.07.2016 um 14:50:
Speaking of compression, I just saw that the ticket about compressing .pyc files is still open, so I guess that nothing has been done in that regard. It still surprises me that they aren't. They are classical write-once, read-many kind of files, so not compressing them seems counter-intuitive. https://bugs.python.org/issue22789 Obviously, compressing everything into an archive instead of compressing each file separately is bound to provide even larger gains, but that doesn't deal with the fact that some things usually get compiled and installed later than others. Stefan
participants (11)
-
Guido van Rossum
-
Kevin Ollivier
-
M.-A. Lemburg
-
Nick Coghlan
-
Paul Moore
-
Serhiy Storchaka
-
Stefan Behnel
-
Stephen J. Turnbull
-
Terry Reedy
-
Victor Stinner
-
Xavier de Gaye