![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 11 June 2017 at 13:27, Joshua Morton <joshua.morton13@gmail.com> wrote:
David: You're absolutely right, s/2/3 in my prior post!
Neal: As for why zip (at first I thought you meant the zip function, not the zip compression scheme) is included and rle is not, zip is (or was), I believe, used as part of python's packaging infrastructure, hopefully someone else can correct me if that's untrue.
There are a variety of reasons why things end up in the standard library: - they're part of how the language works (e.g. contextlib, types, operator, numbers, pickle, abc, copy, sys, importlib, zipimport) - they're modeling core computing concepts (e.g. math, cmath, decimal, fractions, re, encodings) - they're widely useful, and harder to get right than they may first appear (e.g. collections, itertools, secrets, statistics, struct, shelve) - they're widely useful (or at least once were), and just plain hard to get right (e.g. ssl, http, json, xml, xmlrpc, zip, bz2) - they provide cross-platform abstractions of operating system level interfaces (e.g. os, io, shutil, pathlib) - they're helpful in enabling ecosystem level interoperability between tools (e.g. logging, unittest, asyncio) - they're part of the way *nix systems work (e.g. grp, pwd) - they're useful in working on other parts of the standard library (e.g. unittest.mock, enum) "zip" and the other compression libraries check a couple of those boxes: they're broadly useful *and* they're needed in other parts of the standard library (e.g. lots of network protocols include compression support, we support importing from zip archives, and we support generating them through distutils, shutil, and zipapp) Run-length-encoding on the other hand is one of those things where the algorithm is pretty simple, and you're almost always going to be able to get the best results by creating an implement tailored specifically to your use case, rather than working through a general purpose abstraction like the iterator protocol. Even when that isn't the case, implementing your own utility function is still going to be competitive time-wise with finding and using a standard implementation. I suspect the main long term value of offering a recommended implementation as an itertools recipe wouldn't be in using it directly, but rather in making it easier for people to: - find the recipe if they already know the phrase "run length encoding" - test their own implementations that are more tailored to their specific use case The one itertools building block I do sometimes wish we had was a `counted_len` helper that: - used `len(obj)` if the given object implemented `__len__` - fell back to `sum(1 for __ in obj)` otherwise Otherwise you have to make the choice between: - use `len(obj)`, and only support sequences - use `len(list(obj))` and potentially make a pointless copy - use `sum(1 for __ in obj)` and ignore the possible O(1) fast path - writing your own `counted_len` helper: def counted_len(iterable): try: return len(iterable) except TypeError: pass return sum(1 for __ in iter(iterable)) If there was an itertools.counted_len function then the obvious option would be "use itertools.counted_len". Such a function would also become the obvious way to consume an iterator when you don't care about the individual results - you'd just process them all, and get the count of how many iterations happened. Given such a helper, the recipe for run-length-encoding would then be: def run_length_encode(iterable): return ((k, counted_len(g)) for k, g in groupby(iterable)) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia