
Ideally, we'd have a functional package manager: one that can delete binaries when disk space is running low, and recompiles from sources when the binary is needed again. It could store Python 2 as a series of diffs against Python 3. On Thu, Jan 2, 2020, 2:22 AM Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:

James Lu writes:
First, that's not what package managers do. Package managers manage dependencies, not disk space. Second, my "where I get almost all my work done" Python 3.7 installation is only 571MB; I need about 20GB free to update XCode, so disk space is not high on my reasons for screwing with a Python installation. Much more practical to search for old .isos and .dmgs to delete. If I really thought a few hundred MB would help, I'd just delete the whole thing and hope nothing depended on it until I could resolve the issue requiring more space than I have. If you really want to save disk space measured in MB, I guess you'd probably want LRU semantics, and possibly a blacklist of modules that are only used interactively and at most once a day or so, so the time to import doesn't really matter. It would be easy enough to write a script to do the cleaning (the stat module can get you the access times), and Python will take care of the recompilation since it only ever executes compiled code. Although speaking practically, you might want to compile with -O and delete all the sources if space is such a problem. Teach help() to call out to doc.python.org or/and pip the sources.
It could store Python 2 as a series of diffs against Python 3.
Even if you used a binary diff such as xdelta or bsdiff, I doubt this would save space. Feel free to try it and tell me I'm wrong, though. Steve

On Sat, Jan 4, 2020 at 1:16 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Be aware that access times aren't always accurate, since they depend on the system performing a write every time you request a read. That's an inefficiency that can easily be disabled (perhaps selectively - the "relatime" mount option in Linux means to update the access time only if the file's been changed since it was last accessed), at the price of breaking the stats of something like you propose. But honestly, if you're worried about a MB here and a MB there, you probably will do better to zip up the stdlib than to fiddle around with deleting stale modules. ChrisA

Where’s the initial email you’re replying to here? I don’t have it in my inbox, and it isn’t on the Mailman archive either, and since you snipped it down to a single line I have no idea what that snippet was referring to. Meanwhile:
I’m not sure why you want to store Python 2 as diffs against Python 3, but that is exactly what a version control system does, not what a package manager does; it’s easy if you use the right tool. For example: * Create an empty git repo in ~/python * Create a branch called 2.7 * Build Python 2.7 and install it to that directory, and also install any site-packages you want * Commit * Create a new branch off master called 3.8 * Build Python 3.8 and install it to that directory, and also install any site-packages you want * Commit Git (and its various plugins for diffing files) takes care of what’s a diff against what and so on; you just have to know there are at least two useful leaves on the graph of diffs and they can be accessed with the names 2.7 and 3.8. Now you have to checkout the appropriate branch every time you want to run python or python2, or activate a non-isolated virtual environments built from either, etc. Which seems like a huge pain, but it seems like you’re asking for that pain. Would this actually save any space? I don’t know. I think 2.7 and 3.8 have so little binary code in common (bad enough for just the executable—things like stdlib .pyc files end up with different names in different locations that git probably isn’t smart enough to link up in the first place) that you might well waste more on repo overhead than you’d gain in deduplicating. (If you wanted 3.8.0 and 3.8.1 and 3.8.2, on the other hand, that would probably save a lot more, but how often do you want an earlier bug fix build of the same minor version?) But it’s simple enough to do that rather than guessing you can just try it.

James Lu writes:
First, that's not what package managers do. Package managers manage dependencies, not disk space. Second, my "where I get almost all my work done" Python 3.7 installation is only 571MB; I need about 20GB free to update XCode, so disk space is not high on my reasons for screwing with a Python installation. Much more practical to search for old .isos and .dmgs to delete. If I really thought a few hundred MB would help, I'd just delete the whole thing and hope nothing depended on it until I could resolve the issue requiring more space than I have. If you really want to save disk space measured in MB, I guess you'd probably want LRU semantics, and possibly a blacklist of modules that are only used interactively and at most once a day or so, so the time to import doesn't really matter. It would be easy enough to write a script to do the cleaning (the stat module can get you the access times), and Python will take care of the recompilation since it only ever executes compiled code. Although speaking practically, you might want to compile with -O and delete all the sources if space is such a problem. Teach help() to call out to doc.python.org or/and pip the sources.
It could store Python 2 as a series of diffs against Python 3.
Even if you used a binary diff such as xdelta or bsdiff, I doubt this would save space. Feel free to try it and tell me I'm wrong, though. Steve

On Sat, Jan 4, 2020 at 1:16 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Be aware that access times aren't always accurate, since they depend on the system performing a write every time you request a read. That's an inefficiency that can easily be disabled (perhaps selectively - the "relatime" mount option in Linux means to update the access time only if the file's been changed since it was last accessed), at the price of breaking the stats of something like you propose. But honestly, if you're worried about a MB here and a MB there, you probably will do better to zip up the stdlib than to fiddle around with deleting stale modules. ChrisA

Where’s the initial email you’re replying to here? I don’t have it in my inbox, and it isn’t on the Mailman archive either, and since you snipped it down to a single line I have no idea what that snippet was referring to. Meanwhile:
I’m not sure why you want to store Python 2 as diffs against Python 3, but that is exactly what a version control system does, not what a package manager does; it’s easy if you use the right tool. For example: * Create an empty git repo in ~/python * Create a branch called 2.7 * Build Python 2.7 and install it to that directory, and also install any site-packages you want * Commit * Create a new branch off master called 3.8 * Build Python 3.8 and install it to that directory, and also install any site-packages you want * Commit Git (and its various plugins for diffing files) takes care of what’s a diff against what and so on; you just have to know there are at least two useful leaves on the graph of diffs and they can be accessed with the names 2.7 and 3.8. Now you have to checkout the appropriate branch every time you want to run python or python2, or activate a non-isolated virtual environments built from either, etc. Which seems like a huge pain, but it seems like you’re asking for that pain. Would this actually save any space? I don’t know. I think 2.7 and 3.8 have so little binary code in common (bad enough for just the executable—things like stdlib .pyc files end up with different names in different locations that git probably isn’t smart enough to link up in the first place) that you might well waste more on repo overhead than you’d gain in deduplicating. (If you wanted 3.8.0 and 3.8.1 and 3.8.2, on the other hand, that would probably save a lot more, but how often do you want an earlier bug fix build of the same minor version?) But it’s simple enough to do that rather than guessing you can just try it.
participants (4)
-
Andrew Barnert
-
Chris Angelico
-
James Lu
-
Stephen J. Turnbull