[Python-Dev] versioned .so files for Python 3.2

John Arbash Meinel john.arbash.meinel at gmail.com
Wed Jul 7 23:56:23 CEST 2010


Scott Dial wrote:
> On 6/30/2010 2:53 PM, Barry Warsaw wrote:
>> It might be amazing, but it's still a significant overhead.  As I've
>> described, multiply that by all the py files in all the distro packages
>> containing Python source code, and then still try to fit it on a CDROM.
> 
> I decided to prove to myself that it was not a significant issue to have
> parallel directory structures in a .tar.bz2, and I was surprised to find
> it much worse at that then I had imagined. For example,
> 
> # cd /usr/lib/python2.6/site-packages
> # tar --exclude="*.pyc" --exclude="*.pyo" \
>       -cjf mercurial.tar.bz2 mercurial
> # du -h mercurial.tar.bz2
> 640K    mercurial.tar.bz2
> 
> # cp -a mercurial mercurial2
> # tar --exclude="*.pyc" --exclude="*.pyo" \
>       -cjf mercurial2.tar.bz2 mercurial mercurial2
> # du -h mercurial.tar.bz2
> 1.3M    mercurial2.tar.bz2
> 

I believe the standard (and largest) block size for .bz2 is 900kB, and I
*think* that is uncompressed. Though I know that bz2 can chain, since it
can compress all NULL bytes extremely well (multiple GB down to kB, IIRC).

There was a question as to whether LZMA would do better here, I'm using
7zip, but .xz should perform similarly.

$ du -sh mercurial*
2.6M    mercurial
2.6M    mercurial2

366K mercurial.tar.bz2
734K mercurial2.tar.bz2

303K mercurial.7z
310K mercurial2.7z

So LZMA with the 'normal' compression has a big enough window to find
almost all of the redundancy, and 310kB is certainly a very small
increase over the 303kB. And clearly bz2 does not, since 734kB is
actually slightly more than 2x 366kB.

John
=:->


More information about the Python-Dev mailing list