![](https://secure.gravatar.com/avatar/9b513ee7cc400c9b7374f4697a1e69ad.jpg?s=120&d=mm&r=g)
On Fri, May 7, 2021 at 6:39 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, May 07, 2021 at 06:02:51PM -0700, Chris Jerdonek wrote:
To know what compression methods might be effective, I’m wondering if it could be useful to see separate histograms of, say, the start column number and width over the code base. Or for people that really want to dig in, maybe access to the set of all pairs could help. (E.g. maybe a histogram of pairs could also reveal something.)
I think this is over-analysing. Do we need to micro-optimize the compression algorithm? Let's make the choice simple: live with the size increase, or swap to LZ4 compression as Antoine suggested. Analysis paralysis is a real risk here.
If there are implementations which cannot support either (MicroPython?) they should be free to continue doing things the old way. In other words, "fine grained error messages" should be a quality of implementation feature rather than a language guarantee.
I understand that the plan is to make this feature optional in any case, to allow third-party tools to catch up.
If people really want to do that histogram analysis so that they can optimize the choice of compression algorithm, of course they are free to do so. But the PEP authors should not feel that they are obliged to do so, and we should avoid the temptation to bikeshed over compressors.
I'm not sure why you're sounding so negative. Pablo asked for ideas in his first message to the list: On Fri, May 7, 2021 at 2:53 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Does anyone see a better way to encode this information **without complicating a lot the implementation**?
Maybe a large gain can be made with a simple tweak to how the pair is encoded, but there's no way to know without seeing the distribution. Also, my reply wasn't about the pyc files on disk but about their representation in memory, which Pablo later said may be the main concern. So it's not compression algorithms like LZ4 so much as a method of encoding. --Chris
(For what it's worth, I like this proposed feature, I don't care about a 20-25% increase in pyc file size, but if this leads to adding LZ4 compression to the stdlib, I like it even more :-)
-- Steve _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6H2XSRMA... Code of Conduct: http://python.org/psf/codeofconduct/