[Python-Dev] Question on bz2 codec. Is this a bug?

Wed Sep 29 23:05:38 CEST 2010

Chris Bergstresser wrote:
> Hi all --
> 
>    I looked through the bug tracker, but I didn't see this listed.  I
> was trying to use the bz2 codec, but it seems like it's not very
> useful in the current form (and I'm not sure if it's getting added
> back to py3k, so maybe this is a moot point).  It looks like the codec
> writes every piece of data fed to it as a separate compressed block.
> This results in compressed files which are significantly larger than
> the uncompressed files, if you're writing a lot of small bursts of
> data.  It also leads to interesing oddities like this:
> 
>     import codecs
> 
>     with codecs.open('text.bz2', 'w', 'bz2') as f:
>         for x in xrange(20):
>             f.write('This is data %i\n' % x)
> 
>     with codecs.open('text.bz2', 'r', 'bz2') as f:
>         print f.read()
> 
> This prints "This is data 0" and exits, because the codec won't read
> beyond the first compressed block.
> 
> My question is, is this known, intended behavior?  Should I open a bug
> report?  Is it going away in py3k, so there's no real point in fixing
> it?

The codec is scheduled to be added back to Python3.

However, it's main use is in working on whole chunks of
data rather than the line-by-line approach you're after.
This is provided by the codec's incremental encoder/decoders,
but these are currently not used by codecs.open() and
I'm not sure whether the io lib uses them, which could
be used via the regular open().

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 29 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/