[issue23688] unnecessary copying of memoryview in gzip.GzipFile.write ?

Tue Mar 17 15:26:04 CET 2015

New submission from Wolfgang Maier:

I thought I'd go back to work on a test patch for issue21560 today, but now I'm puzzled by the explicit handling of memoryviews in gzip.GzipFile.write.
The method is defined as:

    def write(self,data):
        self._check_closed()
        if self.mode != WRITE:
            import errno
            raise OSError(errno.EBADF, "write() on read-only GzipFile object")

        if self.fileobj is None:
            raise ValueError("write() on closed GzipFile object")

        # Convert data type if called by io.BufferedWriter.
        if isinstance(data, memoryview):
            data = data.tobytes()

        if len(data) > 0:
            self.size = self.size + len(data)
            self.crc = zlib.crc32(data, self.crc) & 0xffffffff
            self.fileobj.write( self.compress.compress(data) )
            self.offset += len(data)

        return len(data)

So for some reason, when it gets passed data as a meoryview it will first copy its content to a bytes object and I do not understand why.
zlib.crc32 and zlib.compress seem to be able to deal with memoryviews so the only sepcial casing that seems required here is in determining the byte length of the data, which I guess needs to use memoryview.nbytes. I've prepared a patch (overlapping the one for issue21560) that avoids copying the data and seems to work fine.

Did I miss something about the importance of the tobytes conversion ?

----------
components: Library (Lib)
messages: 238294
nosy: wolma
priority: normal
severity: normal
status: open
title: unnecessary copying of memoryview in gzip.GzipFile.write ?
type: resource usage
versions: Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23688>
_______________________________________