CACHEDIR.TAG for __pycache__ on Linux

Hello, I have a small suggestion to make about the generation of the `__pycache__` directory. Would it be possible to also generate a `CACHEDIR.TAG` file in this directory when running Python on Linux? The file is (informally?) specified at <https://bford.info/cachedir/>. This would allow `__pycache__` to be automatically excluded by various Linux tools such as `tar` (using the `--exclude-caches-all` flag). Cheers, Leon

Hi Leon (and Bryan) Here's some comments on your interesting suggestion. Bryan Ford's CACHEDIR.TAG proposal is supported by gnu tar. See: https://www.gnu.org/software/tar/manual/html_node/exclude.html By the way, I've copied Bryan. There's a cachedir_tag module on PyPi, by Alex Willmer: https://pypi.org/project/cachedir-tag/ This search shows some other systems use or think of using CACHEDIR.TAG. https://www.google.com/search?q=CACHEDIR.TAG Other than tar, I don't see any Linux / GNU tools that use CACHEDIR.TAG. Further, in 2004 Bryan Ford opened a Mozilla request to use CACHEDIR.TAG. This was closed in 2014 as WONTFIX. This discussion is useful. https://bugzilla.mozilla.org/show_bug.cgi?id=252179 The change requested is small. However, it implies an endorsement of Bryan Ford's elegant and well presented idea, which is a much larger matter. Rather than saying NO, perhaps a discussion of ALL the problems involved in __pycache__ might be better. Here's one that interests me. How do we efficiently and elegantly support __pycache__ on read-only file systems? A leading example is Ubuntu's: https://en.wikipedia.org/wiki/Snap_(package_manager) For example, on my Ubuntu PC there's a read-only folder that contains __pycache__ folders. /snap/core/11187/usr/lib/python3/dist-packages/ Suppose we want to use the version 11187 dist_packages with multiple versions of Python? We're stymied, unless we know the Python versions prior to creating (or mounting) the read-only file system. By the way, on my Ubuntu PC all the /snap folders are in fact mounted SquashFS images. These images are stored in: /var/lib/snapd/snaps/ Thus, there's little reason to archive / backup the /snap folder. It's /var/lib/snapd/snaps that needs backing up. Finally, Bryan Ford has a focus on systems security. The architecture of Ubuntu's snap architecture seems to reduce the attack surface, although there's more work to be done: https://snapcraft.io/blog/where-eagles-snap-snap-security-overview https://en.wikipedia.org/wiki/Snap_(package_manager)#Configurable_sandbox I hope this helps. -- Jonathan

Interesting stuff, thanks for sharing. I wasn't sure if any other tools supported it but I assumed they would... Thanks for setting that straight. I was tarring up some code to shift it onto a cluster earlier, and I decided to experiment with those flags. When I saw that the `.mypy_cache` folder was excluded but not the `__pycache__` folder, I decided to submit the suggestion. Leon

Hi Leon (and Bryan) Here's some comments on your interesting suggestion. Bryan Ford's CACHEDIR.TAG proposal is supported by gnu tar. See: https://www.gnu.org/software/tar/manual/html_node/exclude.html By the way, I've copied Bryan. There's a cachedir_tag module on PyPi, by Alex Willmer: https://pypi.org/project/cachedir-tag/ This search shows some other systems use or think of using CACHEDIR.TAG. https://www.google.com/search?q=CACHEDIR.TAG Other than tar, I don't see any Linux / GNU tools that use CACHEDIR.TAG. Further, in 2004 Bryan Ford opened a Mozilla request to use CACHEDIR.TAG. This was closed in 2014 as WONTFIX. This discussion is useful. https://bugzilla.mozilla.org/show_bug.cgi?id=252179 The change requested is small. However, it implies an endorsement of Bryan Ford's elegant and well presented idea, which is a much larger matter. Rather than saying NO, perhaps a discussion of ALL the problems involved in __pycache__ might be better. Here's one that interests me. How do we efficiently and elegantly support __pycache__ on read-only file systems? A leading example is Ubuntu's: https://en.wikipedia.org/wiki/Snap_(package_manager) For example, on my Ubuntu PC there's a read-only folder that contains __pycache__ folders. /snap/core/11187/usr/lib/python3/dist-packages/ Suppose we want to use the version 11187 dist_packages with multiple versions of Python? We're stymied, unless we know the Python versions prior to creating (or mounting) the read-only file system. By the way, on my Ubuntu PC all the /snap folders are in fact mounted SquashFS images. These images are stored in: /var/lib/snapd/snaps/ Thus, there's little reason to archive / backup the /snap folder. It's /var/lib/snapd/snaps that needs backing up. Finally, Bryan Ford has a focus on systems security. The architecture of Ubuntu's snap architecture seems to reduce the attack surface, although there's more work to be done: https://snapcraft.io/blog/where-eagles-snap-snap-security-overview https://en.wikipedia.org/wiki/Snap_(package_manager)#Configurable_sandbox I hope this helps. -- Jonathan

Interesting stuff, thanks for sharing. I wasn't sure if any other tools supported it but I assumed they would... Thanks for setting that straight. I was tarring up some code to shift it onto a cluster earlier, and I decided to experiment with those flags. When I saw that the `.mypy_cache` folder was excluded but not the `__pycache__` folder, I decided to submit the suggestion. Leon
participants (2)
-
adigitoleo (Leon)
-
Jonathan Fine