[Tutor] Chunking list/array data?
Sarah Hembree
sarah123ed at gmail.com
Sun Aug 25 23:03:54 EDT 2019
thank you for your feedback. We'd not thought to use generators. Now we are
looking how might best be done with numpy/numba. Do you have any thoughts
in that arena?
--- We not only inherit the Earth from our Ancestors, we borrow it from our
Children. Aspire to grace.
On Thu, 22 Aug 2019 at 06:00, Cameron Simpson <cs at cskk.id.au> wrote:
> On 21Aug2019 21:26, Sarah Hembree <sarah123ed at gmail.com> wrote:
> >How do you chunk data? We came up with the below snippet. It works (with
> >integer list data) for our needs, but it seems so clunky.
> >
> > def _chunks(lst: list, size: int) -> list:
> > return [lst[x:x+size] for x in range(0, len(lst), size)]
> >
> >What do you do? Also, what about doing this lazily so as to keep memory
> >drag at a minimum?
>
> This looks pretty good to me. But as you say, it constructs the complete
> list of chunks and returns them all. For many chunks that is both slow
> and memory hungry.
>
> If you want to conserve memory and return chunks in a lazy manner you
> can rewrite this as a generator. A first cut might look like this:
>
> def _chunks(lst: list, size: int) -> list:
> for x in range(0, len(lst), size):
> yield lst[x:x+size]
>
> which causes _chunk() be a generator function: it returns an iterator
> which yields each chunk one at a time - the body of the function is kept
> "running", but stalled. When you iterate over the return from _chunk()
> Python runs that stalled function until it yields a value, then stalls
> it again and hands you that value.
>
> Modern Python has a thing called a "generator expression". Your original
> function is a "list comprehension": it constructs a list of values and
> returns that list. In many cases, particularly for very long lists, that
> can be both slow and memory hungry. You can rewrite such a thing like
> this:
>
> def _chunks(lst: list, size: int) -> list:
> return ( lst[x:x+size] for x in range(0, len(lst), size) )
>
> Omitting the square brackets turns this into a generator expression. It
> returns an iterator instead of a list, which functions like the
> generator function I sketched, and generates the chunks lazily.
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
>
More information about the Tutor
mailing list