[Python-ideas] Run length encoding

David Mertz mertz at gnosis.cx
Sat Jun 10 23:29:29 EDT 2017


If what you really want is sparse matrices, you should use those:
https://docs.scipy.org/doc/scipy/reference/sparse.html.

Or maybe from the experimental Dask offshoot that I contributed a few lines
to: https://github.com/mrocklin/sparse.

Either of those will be about two orders of magnitude faster than working
with Python lists for numeric data.

The reason, I think, that there's no RLE module in Python (standard
library, there's probably something on PyPI) is that it's so easy to roll
your own with the building blocks in itertools.  The `zipfile` and `gzip`
modules are written in hundreds or thousands of lines of C code, and more
importantly they are for dealing with *files* mostly, not generic
sequences... that's the domain of `itertools`, but `itertools` is kept to a
bare minimum collection of building blocks from which 1-10 line functions
can be built efficiently.

On Sat, Jun 10, 2017 at 8:14 PM, Neal Fultz <nfultz at gmail.com> wrote:

> Thanks, that's cool.   Maybe the root problem is that the docs aren't
> using the right words when I google. Run-length-encoding is particularly
> relevant for spare matrices, but there's probably a library for those as
> well.  On the data science side of things, there's a few hundred R packages
> that use it there[1].
>
> Can you explicate the guiding principle a bit? I'm perplexed that python
> would come with zip and gzip but not rle.
>
> [1] : https://github.com/search?l=R&q=user%3Acran+rle&type=Code&
> utf8=%E2%9C%93
>
> On Sat, Jun 10, 2017 at 7:59 PM, David Mertz <mertz at gnosis.cx> wrote:
>
>> Here's a one-line version:
>>
>> from itertools import groupby
>> rle_encode = lambda it: (
>>     (l[0],len(l)) for g in groupby(it) for l in [list(g[1])])
>>
>> Since "not every one line function needs to be in the standard library"
>> is a guiding principle of Python, and even moreso of `itertools`, probably
>> this is a recipe in the documentation at most.  Or maybe it would have a
>> home in `more_itertools`.
>>
>>
>> On Sat, Jun 10, 2017 at 7:20 PM, Neal Fultz <nfultz at gmail.com> wrote:
>>
>>> Hello python-ideas,
>>>
>>> I am very new to this, but on a different  forum and after a couple
>>> conversations, I really wished Python came with run-length encoding
>>> built-in; after all, it ships with zip, which is much more complicated :)
>>>
>>> The general idea is to be able to go back and forth between two
>>> representations of a sequence:
>>>
>>> [1,1,1,1,2,3,4,4,3,3,3]
>>>
>>> and
>>>
>>> [(1, 4), (2, 1), (3, 1), (4, 2), (3, 3)]
>>>
>>> where the first element is the data element, and the second is how many
>>> times it is repeated.
>>>
>>> I wrote an encoder/decoder in about 20 lines (
>>> https://github.com/nfultz/rle.py/blob/master/rle.py ) and would like to
>>> offer it for the next version; I think it might fit in nicely in the
>>> itertools module, for example. I am curious about your thoughts.
>>>
>>> Best,
>>>
>>> -Neal
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170610/420f9373/attachment-0001.html>


More information about the Python-ideas mailing list