[Python-Dev] Question on bz2 codec. Is this a bug?
M.-A. Lemburg
mal at egenix.com
Wed Sep 29 23:05:38 CEST 2010
Chris Bergstresser wrote:
> Hi all --
>
> I looked through the bug tracker, but I didn't see this listed. I
> was trying to use the bz2 codec, but it seems like it's not very
> useful in the current form (and I'm not sure if it's getting added
> back to py3k, so maybe this is a moot point). It looks like the codec
> writes every piece of data fed to it as a separate compressed block.
> This results in compressed files which are significantly larger than
> the uncompressed files, if you're writing a lot of small bursts of
> data. It also leads to interesing oddities like this:
>
> import codecs
>
> with codecs.open('text.bz2', 'w', 'bz2') as f:
> for x in xrange(20):
> f.write('This is data %i\n' % x)
>
> with codecs.open('text.bz2', 'r', 'bz2') as f:
> print f.read()
>
> This prints "This is data 0" and exits, because the codec won't read
> beyond the first compressed block.
>
> My question is, is this known, intended behavior? Should I open a bug
> report? Is it going away in py3k, so there's no real point in fixing
> it?
The codec is scheduled to be added back to Python3.
However, it's main use is in working on whole chunks of
data rather than the line-by-line approach you're after.
This is provided by the codec's incremental encoder/decoders,
but these are currently not used by codecs.open() and
I'm not sure whether the io lib uses them, which could
be used via the regular open().
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Sep 29 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-Dev
mailing list