An itertools.interleave / itertools.join function
Hi I like using itertools for creating long strings while not paying the cost of intermediate strings (by eventually calling str.join on the whole iterator). However, one missing feature is to mimic the behavior of str.join as an iterator: an iterator that returns the items of an iterable, separated by the separator. I suggest name "interleave" or "join" (whichever is the most clear / least ambigous). def interleave(sep, iterable): """ Makes an iterator that returns elements from an iterable, separated by the separator. """ notfirst = False for i in iterable: if notfirst: yield sep else: notfirst = True yield i Could imagine a more elaborate implementation that can take several iterators, and would be equivalent to lambda chain_zip_interleave sep, *iterables: itertools.chain.from_iterable(interleave((sep,), zip(*iterables))) But that may be seriously overkill, and I have hard time describing it.
Hey, You can always do `itertools.chain.from_iterable(zip(iterable, itertools.repeat(sep)))` but I agree that it is verbose. Cheers, E On Wed, 9 Dec 2020 at 04:16, <aurelien.lambert.89@gmail.com> wrote:
Hi
I like using itertools for creating long strings while not paying the cost of intermediate strings (by eventually calling str.join on the whole iterator). However, one missing feature is to mimic the behavior of str.join as an iterator: an iterator that returns the items of an iterable, separated by the separator. I suggest name "interleave" or "join" (whichever is the most clear / least ambigous).
def interleave(sep, iterable): """ Makes an iterator that returns elements from an iterable, separated by the separator. """ notfirst = False for i in iterable: if notfirst: yield sep else: notfirst = True yield i
Could imagine a more elaborate implementation that can take several iterators, and would be equivalent to lambda chain_zip_interleave sep, *iterables: itertools.chain.from_iterable(interleave((sep,), zip(*iterables))) But that may be seriously overkill, and I have hard time describing it. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YWT5BV... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Dec 9, 2020 at 11:08 AM Evpok Padding <evpok.padding@gmail.com> wrote:
Hey,
You can always do `itertools.chain.from_iterable(zip(iterable, itertools.repeat(sep)))` but I agree that it is verbose.
Cheers,
E
Worth noting that implementation puts a terminal sep on the end of the result, which isn't what OP seems to want. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
I understand the concept, but no real use case comes to my mind, and certainly I personally have not needed this. Itertools is kept extremely minimal by design, containing only a small set of orthogonal primitives. The recipes in its docs contain others, as does more-itertools. This feels like something that belongs, perhaps, in more-itertools. On Wed, Dec 9, 2020 at 4:17 AM <aurelien.lambert.89@gmail.com> wrote:
Hi
I like using itertools for creating long strings while not paying the cost of intermediate strings (by eventually calling str.join on the whole iterator). However, one missing feature is to mimic the behavior of str.join as an iterator: an iterator that returns the items of an iterable, separated by the separator. I suggest name "interleave" or "join" (whichever is the most clear / least ambigous).
def interleave(sep, iterable): """ Makes an iterator that returns elements from an iterable, separated by the separator. """ notfirst = False for i in iterable: if notfirst: yield sep else: notfirst = True yield i
Could imagine a more elaborate implementation that can take several iterators, and would be equivalent to lambda chain_zip_interleave sep, *iterables: itertools.chain.from_iterable(interleave((sep,), zip(*iterables))) But that may be seriously overkill, and I have hard time describing it. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YWT5BV... Code of Conduct: http://python.org/psf/codeofconduct/
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
I agree that itertools shouldn't be a collection of vaguely useful functions. I proposed that one because I have needed this one many times, often used str.join instead which comes at a greater cost than iterating pieces of string. I didn't know about more-itertools library (which already has an interleave function that does something different), I guess it makes more sense there.
On Wed, Dec 9, 2020 at 7:54 PM <aurelien.lambert.89@gmail.com> wrote:
I agree that itertools shouldn't be a collection of vaguely useful functions. I proposed that one because I have needed this one many times,
Could you give an actual use case? I haven't been able to imagine one. -CHB
often used str.join instead which comes at a greater cost than iterating pieces of string.
I didn't know about more-itertools library (which already has an interleave function that does something different), I guess it makes more sense there. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OADW4Z... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Here is an example of how I use it to build an arbitrary long SQL request without having to pay for long intermediate strings, both in computation on memory. from itertools import chain #, join def join(sep, iterable): notfirst=False for i in iterable: if notfirst: yield sep else: notfirst=True yield i table = 'mytable' columns=('id', 'v1', 'v2') values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)] request = ''.join(chain( ('INSERT INTO ', table, '('), join(', ', columns), (') VALUES (',), chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))), )) args = list(chain.from_iterable(values)) print(request) > INSERT INTO mytable(id, v1, v2) VALUES (%s, %s, %s), (%s, %s, %s), (%s, %s, %s) ON DUPLICATE KEY UPDATE id=VALUES(id), v1=VALUES(v1), v2=VALUES(v2) I often had such cases, but ended up using the more costy str.join .
On Fri, Dec 11, 2020 at 2:46 PM <aurelien.lambert.89@gmail.com> wrote:
Here is an example of how I use it to build an arbitrary long SQL request without having to pay for long intermediate strings, both in computation on memory.
from itertools import chain #, join def join(sep, iterable): notfirst=False for i in iterable: if notfirst: yield sep else: notfirst=True yield i
table = 'mytable' columns=('id', 'v1', 'v2') values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)] request = ''.join(chain( ('INSERT INTO ', table, '('), join(', ', columns), (') VALUES (',), chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))), )) args = list(chain.from_iterable(values))
print(request) > INSERT INTO mytable(id, v1, v2) VALUES (%s, %s, %s), (%s, %s, %s), (%s, %s, %s) ON DUPLICATE KEY UPDATE id=VALUES(id), v1=VALUES(v1), v2=VALUES(v2)
I often had such cases, but ended up using the more costy str.join .
Is it really more costly? With strings the size of SQL queries (keeping in mind that these strings (correctly) contain no actual data, just placeholders), I doubt you'll see any significant performance hit from this. Also, I would be VERY surprised if the cost of in-memory string manipulation exceeds the cost of an actual database transaction. But more importantly: Is it any more readable? What you have there is pretty opaque. Is the str.join version worse than that? If not, I'd just stick with str.join. ChrisA
On Thu, Dec 10, 2020 at 8:03 PM Chris Angelico <rosuav@gmail.com> wrote:
Here is an example of how I use it to build an arbitrary long SQL request without having to pay for long intermediate strings, both in computation on memory.
<snip>
I often had such cases, but ended up using the more costy str.join .
Is it really more costly? With strings the size of SQL queries (keeping in mind that these strings (correctly) contain no actual data, just placeholders), I doubt you'll see any significant performance hit from this.
I had the same thought -- expecting that it would take some pretty darn big intermediate strings to be any faster at all. I was not quite right. If you replace the "join" iterator with plain str.join(), it does run a touch slower: def join_iterable(table, columns, values): request = ''.join(chain( ('INSERT INTO ', table, '('), join(', ', columns), (') VALUES (',), chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))), )) return request def join_str(table, columns, values): request = ''.join(chain( ('INSERT INTO ', table, '('), ', '.join(columns), (') VALUES (',), chain.from_iterable('), ('.join(', '.join('%s' for v in value) for value in values)), (') ON DUPLICATE KEY UPDATE ',), chain.from_iterable(', '.join(f"{c}=VALUES({c})" for c in columns)), )) return request In [31]: %timeit join_iterable(table, columns, values) 8.65 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [32]: %timeit join_str(table, columns, values) 13 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) But if you get rid of the whole pile of nested iterators and chain, and make it a simple str.join, with each bit using plain str.join(), it's faster still. def join_str_simple(table, columns, values): request = "".join(["INSERT INTO ", f"{table}({', '.join(columns)})", " VALUES (", "), (".join(', '.join('%s' for v in value) for value in values), ") ON DUPLICATE KEY UPDATE ", ", ".join(f"{c}=VALUES({c})" for c in columns), ]) return request In [33]: %timeit join_str_simple(table, columns, values) 5.09 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) And I would argue more readable. (though the double comprehensions are not great ...) Granted, you may lose that advantage if you had a lot more values -- but I expect it wouldn't be a real issue until you had hundreds or more.
Also, I would be VERY surprised if the cost of in-memory string manipulation exceeds the cost of an actual database transaction.
well, yeah -- these are all very fast.
But more importantly: Is it any more readable? What you have there is pretty opaque. Is the str.join version worse than that?
you be the judge :-) Code enclosed. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
participants (6)
-
aurelien.lambert.89@gmail.com
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Evpok Padding
-
Ricky Teachey