Why does lzma hangs for a very long time when run in parallel using python's muptiprocessing module?
cantor cantor
cantormath2 at gmail.com
Fri Oct 25 12:21:07 EDT 2013
When trying to run lzma in parallel (see the code below) it hangs for a
very long time. The non-parallel version of the code using map() works fine
as shown in the code below.
Python 3.3.2 [GCC 4.6.3] on linux
import lzmafrom functools import partialimport multiprocessing
def run_lzma(data,c):
return c.compress(data)
def split_len(seq, length):
return [str.encode(seq[i:i+length]) for i in range(0, len(seq), length)]
def lzma_mp(sequence,threads=3):
lzc = lzma.LZMACompressor()
blocksize = int(round(len(sequence)/threads))
strings = split_len(sequence, blocksize)
lzc_partial = partial(run_lzma,c=lzc)
pool=multiprocessing.Pool()
lzc_pool = list(pool.map(lzc_partial,strings))
pool.close()
pool.join()
out_flush = lzc.flush()
return b"".join(lzc_pool + [out_flush])
sequence = 'AAAAAJKDDDDDDDDDDDDDDDDDDDDDDDDDDDDGJFKSHFKLHALWEHAIHWEOIAH
IOAHIOWEHIOHEIOFEAFEASFEAFWEWWWWWWWWWWWWWWWWWWWWWWWWWWWWWEWFQWEWQWQGEWQFEWFDWEWEGEFGWEG'
lzma_mp(sequence,threads=3)
When using lzma and the map function it works fine.
threads=3
blocksize = int(round(len(sequence)/threads))
strings = split_len(sequence, blocksize)
lzc = lzma.LZMACompressor()
out = list(map(lzc.compress,strings))
out_flush = lzc.flush()
result = b"".join(out + [out_flush])
lzma.compress(str.encode(sequence))
lzma.compress(str.encode(sequence)) == result
Map using partial function works fine as well.
lzc = lzma.LZMACompressor()
lzc_partial = partial(run_lzma,c=lzc)
out = list(map(lzc_partial,strings))
out_flush = lzc.flush()
result = b"".join(out + [out_flush])
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20131025/9a5c53f6/attachment.html>
More information about the Python-list
mailing list