If the consensus is "Let's add ten lines to the recipes" I'm all aboard, ignore the rest: 

if I could have googled a good answer I would have stopped there. I won't argue the necessity or obviousness of itertools.groupby, just it's name: 
  * I myself am a false negative that wanted the RLE behavior 
      *and couldn't find it easily
          * so we should update the docs
  * other people have been false positive and wanted a SQL-type group by, but got burned
       * hence the warnings in the docs.
          * If you say explicate "by run", some extra group of them will then  know what that means vs the current wording.

I would definitely also support adding helper functions though, I think this is a very common use case which turns up in math/optimization applied to geology, biology, ... , and also fax machines: https://en.wikipedia.org/wiki/Run-length_encoding

Also, if someone rewrote zip in pure python, would many people actually notice a slow down vs network latency, disk IO,  etc? RLE is a building block just like bisect. 

:) Anyway, I'm not claiming my implementation is some huge gift, but let's at least add a recipe or documentation so people can find y'all's way later without reinventing the wheel.  

On Sat, Jun 10, 2017 at 10:19 PM, David Mertz <mertz@gnosis.cx> wrote:
If you understand what iterators do, the fact that itertools.groupby collects contiguous elements is both obvious and necessary.  Iterators might be infinitely long... you cannot ask for every "A" that might eventually occur in an infinite sequence of letters.

On Sat, Jun 10, 2017 at 10:08 PM, Neal Fultz <nfultz@gmail.com> wrote:
Agreed to a degree about providing it as code, but it may also be worth mentioning also that zlib itself implements rle [1], and if there was ever a desire to  go "python all the way down" you need an RLE somewhere anyway :) 

That said, I'll be pretty happy with anything that replaces an hour of google/coding/testing/(hour later find out I'm an idiot from a random listserv) with 1 minute of googling.  Again, my issue isn't that it was difficult to code, but it *was* hard to make the research-y jump from googling for "run length encoding python", where I knew *exactly* what algorithm I wanted, to  "itertools.groupby" which appears to be more general purpose and needs a little tweaking.  Adjusting the docs/recipes would probably solve that problem.

 -- To me this is roughly on the same level as googling for 'binary search python' and not having bisect show up.

However, the fact that  `itertools.groupby` doesn't group over elements that are not contiguous is a bit surprising to me coming from SQL/pandas/R land (that is probably a large part of my disconnect here). This is actually explicitly called out in the current docs, but I wonder how many people search for one thing and find the other:

 I googled for RLE and the solution was actually groupby, but probably a lot of other people want a SQL group-by accidentally got an RLE and have to work around that... Then again, I don't know if you all can easily change names of functions at this point.


On Sat, Jun 10, 2017 at 9:39 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
In my experience, RLE isn't something you often find on its own.
Usually it's used as part of some compression scheme that also
has ways of encoding verbatim runs of data and maybe other

So I'm skeptical that it can be usefully provided as a library
function. It seems more like a design pattern than something
you can capture in a library.


Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/

Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/

Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.