[New-bugs-announce] [issue43785] bz2 performance issue.

Inada Naoki report at bugs.python.org
Fri Apr 9 01:42:16 EDT 2021


New submission from Inada Naoki <songofacandy at gmail.com>:

The original issue is reported here.
https://discuss.python.org/t/non-optimal-bz2-reading-speed/6869

1. Only BZ2File uses RLock()

lzma and gzip don't use RLock(). It adds significant performance overhead.
When I removed `with self._lock:`, decompression speed improved from about 148k line/sec to 200k line/sec.


2. The default __iter__ calls `readline()` for each iteration.

BZ2File.readline() is implemented in C so it is slightly slow than C implementation.

If I add this `__iter__()` to BZ2File, decompression speed improved from about 148k lines/sec (or 200k lines/sec) to 500k lines/sec.

    def __iter__(self):
        self._check_can_read()
        return iter(self._buffer)

If this __iter__ method is safe, it can be added to gzip and lzma too.

----------
components: Library (Lib)
files: dec.py
messages: 390588
nosy: methane
priority: normal
severity: normal
status: open
title: bz2 performance issue.
versions: Python 3.10
Added file: https://bugs.python.org/file49948/dec.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43785>
_______________________________________


More information about the New-bugs-announce mailing list