An itertools.interleave / itertools.join function

Hi I like using itertools for creating long strings while not paying the cost of intermediate strings (by eventually calling str.join on the whole iterator). However, one missing feature is to mimic the behavior of str.join as an iterator: an iterator that returns the items of an iterable, separated by the separator. I suggest name "interleave" or "join" (whichever is the most clear / least ambigous). def interleave(sep, iterable): """ Makes an iterator that returns elements from an iterable, separated by the separator. """ notfirst = False for i in iterable: if notfirst: yield sep else: notfirst = True yield i Could imagine a more elaborate implementation that can take several iterators, and would be equivalent to lambda chain_zip_interleave sep, *iterables: itertools.chain.from_iterable(interleave((sep,), zip(*iterables))) But that may be seriously overkill, and I have hard time describing it.

On Wed, Dec 9, 2020 at 11:08 AM Evpok Padding <evpok.padding@gmail.com> wrote:
Worth noting that implementation puts a terminal sep on the end of the result, which isn't what OP seems to want. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

I understand the concept, but no real use case comes to my mind, and certainly I personally have not needed this. Itertools is kept extremely minimal by design, containing only a small set of orthogonal primitives. The recipes in its docs contain others, as does more-itertools. This feels like something that belongs, perhaps, in more-itertools. On Wed, Dec 9, 2020 at 4:17 AM <aurelien.lambert.89@gmail.com> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

I agree that itertools shouldn't be a collection of vaguely useful functions. I proposed that one because I have needed this one many times, often used str.join instead which comes at a greater cost than iterating pieces of string. I didn't know about more-itertools library (which already has an interleave function that does something different), I guess it makes more sense there.

On Wed, Dec 9, 2020 at 7:54 PM <aurelien.lambert.89@gmail.com> wrote:
I agree that itertools shouldn't be a collection of vaguely useful functions. I proposed that one because I have needed this one many times,
Could you give an actual use case? I haven't been able to imagine one. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Here is an example of how I use it to build an arbitrary long SQL request without having to pay for long intermediate strings, both in computation on memory. from itertools import chain #, join def join(sep, iterable): notfirst=False for i in iterable: if notfirst: yield sep else: notfirst=True yield i table = 'mytable' columns=('id', 'v1', 'v2') values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)] request = ''.join(chain( ('INSERT INTO ', table, '('), join(', ', columns), (') VALUES (',), chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))), )) args = list(chain.from_iterable(values)) print(request) > INSERT INTO mytable(id, v1, v2) VALUES (%s, %s, %s), (%s, %s, %s), (%s, %s, %s) ON DUPLICATE KEY UPDATE id=VALUES(id), v1=VALUES(v1), v2=VALUES(v2) I often had such cases, but ended up using the more costy str.join .

On Fri, Dec 11, 2020 at 2:46 PM <aurelien.lambert.89@gmail.com> wrote:
Is it really more costly? With strings the size of SQL queries (keeping in mind that these strings (correctly) contain no actual data, just placeholders), I doubt you'll see any significant performance hit from this. Also, I would be VERY surprised if the cost of in-memory string manipulation exceeds the cost of an actual database transaction. But more importantly: Is it any more readable? What you have there is pretty opaque. Is the str.join version worse than that? If not, I'd just stick with str.join. ChrisA

On Thu, Dec 10, 2020 at 8:03 PM Chris Angelico <rosuav@gmail.com> wrote:
<snip>
I had the same thought -- expecting that it would take some pretty darn big intermediate strings to be any faster at all. I was not quite right. If you replace the "join" iterator with plain str.join(), it does run a touch slower: def join_iterable(table, columns, values): request = ''.join(chain( ('INSERT INTO ', table, '('), join(', ', columns), (') VALUES (',), chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))), )) return request def join_str(table, columns, values): request = ''.join(chain( ('INSERT INTO ', table, '('), ', '.join(columns), (') VALUES (',), chain.from_iterable('), ('.join(', '.join('%s' for v in value) for value in values)), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(', '.join(f"{c}=VALUES({c})" for c in columns)), )) return request In [31]: %timeit join_iterable(table, columns, values) 8.65 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [32]: %timeit join_str(table, columns, values) 13 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) But if you get rid of the whole pile of nested iterators and chain, and make it a simple str.join, with each bit using plain str.join(), it's faster still. def join_str_simple(table, columns, values): request = "".join(["INSERT INTO ", f"{table}({', '.join(columns)})", " VALUES (", "), (".join(', '.join('%s' for v in value) for value in values), ") ON DUPLICATE KEY UPDATE ", ", ".join(f"{c}=VALUES({c})" for c in columns), ]) return request In [33]: %timeit join_str_simple(table, columns, values) 5.09 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) And I would argue more readable. (though the double comprehensions are not great ...) Granted, you may lose that advantage if you had a lot more values -- but I expect it wouldn't be a real issue until you had hundreds or more.
well, yeah -- these are all very fast.
But more importantly: Is it any more readable? What you have there is pretty opaque. Is the str.join version worse than that?
you be the judge :-) Code enclosed. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Dec 9, 2020 at 11:08 AM Evpok Padding <evpok.padding@gmail.com> wrote:
Worth noting that implementation puts a terminal sep on the end of the result, which isn't what OP seems to want. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

I understand the concept, but no real use case comes to my mind, and certainly I personally have not needed this. Itertools is kept extremely minimal by design, containing only a small set of orthogonal primitives. The recipes in its docs contain others, as does more-itertools. This feels like something that belongs, perhaps, in more-itertools. On Wed, Dec 9, 2020 at 4:17 AM <aurelien.lambert.89@gmail.com> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

I agree that itertools shouldn't be a collection of vaguely useful functions. I proposed that one because I have needed this one many times, often used str.join instead which comes at a greater cost than iterating pieces of string. I didn't know about more-itertools library (which already has an interleave function that does something different), I guess it makes more sense there.

On Wed, Dec 9, 2020 at 7:54 PM <aurelien.lambert.89@gmail.com> wrote:
I agree that itertools shouldn't be a collection of vaguely useful functions. I proposed that one because I have needed this one many times,
Could you give an actual use case? I haven't been able to imagine one. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Here is an example of how I use it to build an arbitrary long SQL request without having to pay for long intermediate strings, both in computation on memory. from itertools import chain #, join def join(sep, iterable): notfirst=False for i in iterable: if notfirst: yield sep else: notfirst=True yield i table = 'mytable' columns=('id', 'v1', 'v2') values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)] request = ''.join(chain( ('INSERT INTO ', table, '('), join(', ', columns), (') VALUES (',), chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))), )) args = list(chain.from_iterable(values)) print(request) > INSERT INTO mytable(id, v1, v2) VALUES (%s, %s, %s), (%s, %s, %s), (%s, %s, %s) ON DUPLICATE KEY UPDATE id=VALUES(id), v1=VALUES(v1), v2=VALUES(v2) I often had such cases, but ended up using the more costy str.join .

On Fri, Dec 11, 2020 at 2:46 PM <aurelien.lambert.89@gmail.com> wrote:
Is it really more costly? With strings the size of SQL queries (keeping in mind that these strings (correctly) contain no actual data, just placeholders), I doubt you'll see any significant performance hit from this. Also, I would be VERY surprised if the cost of in-memory string manipulation exceeds the cost of an actual database transaction. But more importantly: Is it any more readable? What you have there is pretty opaque. Is the str.join version worse than that? If not, I'd just stick with str.join. ChrisA

On Thu, Dec 10, 2020 at 8:03 PM Chris Angelico <rosuav@gmail.com> wrote:
<snip>
I had the same thought -- expecting that it would take some pretty darn big intermediate strings to be any faster at all. I was not quite right. If you replace the "join" iterator with plain str.join(), it does run a touch slower: def join_iterable(table, columns, values): request = ''.join(chain( ('INSERT INTO ', table, '('), join(', ', columns), (') VALUES (',), chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))), )) return request def join_str(table, columns, values): request = ''.join(chain( ('INSERT INTO ', table, '('), ', '.join(columns), (') VALUES (',), chain.from_iterable('), ('.join(', '.join('%s' for v in value) for value in values)), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(', '.join(f"{c}=VALUES({c})" for c in columns)), )) return request In [31]: %timeit join_iterable(table, columns, values) 8.65 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [32]: %timeit join_str(table, columns, values) 13 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) But if you get rid of the whole pile of nested iterators and chain, and make it a simple str.join, with each bit using plain str.join(), it's faster still. def join_str_simple(table, columns, values): request = "".join(["INSERT INTO ", f"{table}({', '.join(columns)})", " VALUES (", "), (".join(', '.join('%s' for v in value) for value in values), ") ON DUPLICATE KEY UPDATE ", ", ".join(f"{c}=VALUES({c})" for c in columns), ]) return request In [33]: %timeit join_str_simple(table, columns, values) 5.09 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) And I would argue more readable. (though the double comprehensions are not great ...) Granted, you may lose that advantage if you had a lot more values -- but I expect it wouldn't be a real issue until you had hundreds or more.
well, yeah -- these are all very fast.
But more importantly: Is it any more readable? What you have there is pretty opaque. Is the str.join version worse than that?
you be the judge :-) Code enclosed. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
participants (6)
-
aurelien.lambert.89@gmail.com
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Evpok Padding
-
Ricky Teachey