join() could add separators at start or end
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Having had my last proposal shot down in flames, up I bob with another. 😁 It seems to me that it would be useful to be able to make the str.join() function put separators, not only between the items of its operand, but also optionally at the beginning or end. E.g. '|'.join(('Spam', 'Ham', 'Eggs')) returns 'Spam|Ham|Eggs' but it might be useful to make it return one of '|Spam|Ham|Eggs' 'Spam|Ham|Eggs|' '|Spam|Ham|Eggs|' Again, I suggest that this apply to byte strings as well as strings. Going through the 3.8.3 stdlib I have found 24 examples where the separator needs to be added at the beginning 52 where the separator needs to be added at the end 4 where the separator needs to be added at the both ends I list these examples below. Apologies if there are any mistakes. While guessing is no substitute for measurement, it seems plausible that using this feature where appropriate would increase runtime performance by avoiding 1 (or 2) calls of str.__add__. This is perhaps more relevant when the separator is not a short constant string, as in this example: Lib\email\_header_value_parser.py:2854: return policy.linesep.join(lines) + policy.linesep Note also this example: Lib\site-packages\setuptools\command\build_ext.py:221: pkg = '.'.join(ext._full_name.split('.')[:-1] + ['']) where the author has used the unintuitive device of appending an empty string to a list to force join() to add an extra final dot, thereby avoiding 1 call of str.__add__ at the cost of 1 call of list.append. What I cannot decide is what the best API would be. str.join() currently takes only 1 parameter, so it would be possible to add an extra parameter or two. One scheme would be to have an atEnds parameter which could take values such as 0=default behaviour 1=add sep at start 2=add sep at end 3=add sep at both ends or 's'=add sep at start 'e'=add sep at end 'b'=add sep at both ends (some) other=default behaviour Another would be to have 2 parameters, atStart and atEnd, which would both default to False or 0. E.g. '|'.join(('Spam', 'Ham', 'Eggs'), 1) == '|Spam|Ham|Eggs' '|'.join(('Spam', 'Ham', 'Eggs'), 0, 1) == 'Spam|Ham|Eggs|' Neither scheme results in particularly transparent usage, though no worse than s.splitlines(True) # What on earth is this parameter??? Corner case: What if join() is passed an empty sequence? This is debatable, but I think it should return the separator if requested to add it at the beginning or end, and double it up if both are requested. This would preserve identities such as sep.join(seq, <PleaseAddSeparatorAtBeginning>) == sep + sep.join(seq) Best wishes Rob Cliffe EXAMPLES WHERE SEPARATOR ADDED AT START: Lib\http\server.py:933: splitpath = ('/' + '/'.join(head_parts), tail_part) Lib\site-packages\numpy\ctypeslib.py:333: name += "_"+"_".join(flags) Lib\site-packages\numpy\testing\_private\utils.py:842: err_msg += '\n' + '\n'.join(remarks) Lib\site-packages\pip\_vendor\pyparsing\core.py:2092-2095: out = [ "\n" + "\n".join(comments) if comments else "", pyparsing_test.with_line_numbers(t) if with_line_numbers else t, ] Lib\site-packages\pip\_vendor\requests\status_codes.py:121-125: __doc__ = ( __doc__ + "\n" + "\n".join(doc(code) for code in sorted(_codes)) if __doc__ is not None else None ) Lib\site-packages\reportlab\lib\utils.py:1093: self._writeln(' '+' '.join(A.__self__)) Lib\site-packages\reportlab\platypus\flowables.py:708: L = "\n"+"\n".join(L) Lib\site-packages\twisted\mail\smtp.py:1647: r.append(c + b' ' + b' '.join(v)) Lib\site-packages\twisted\protocols\ftp.py:1203: return (PWD_REPLY, '/' + '/'.join(self.workingDirectory)) Lib\site-packages\twisted\runner\procmon.py:424-426: return ('<' + self.__class__.__name__ + ' ' + ' '.join(l) + '>') Lib\site-packages\twisted\web\rewrite.py:34: request.path = '/'+'/'.join(request.prepath+request.postpath) Lib\site-packages\twisted\web\rewrite.py:51: request.path = '/'+'/'.join(request.prepath+request.postpath) Lib\site-packages\twisted\web\twcgi.py:78: scriptName = b"/" + b"/".join(request.prepath) Lib\site-packages\twisted\web\twcgi.py:95: env["PATH_INFO"] = "/" + "/".join(pp) Lib\site-packages\twisted\web\vhost.py:115: request.path = b'/' + b'/'.join(request.postpath) Lib\site-packages\twisted\web\wsgi.py:283: scriptName = b'/' + b'/'.join(request.prepath) Lib\site-packages\twisted\web\wsgi.py:288: pathInfo = b'/' + b'/'.join(request.postpath) Lib\site-packages\twisted\web\test\test_wsgi.py:272: uri = '/' + '/'.join([urlquote(seg, safe) for seg in requestSegments]) Lib\site-packages\wx\py\magic.py:55: command = 'sx("'+aliasDict[c[0]]+' '+' '.join(c[1:])+'")' Lib\site-packages\zope\interface\exceptions.py:257-260: return '\n ' + '\n '.join( x._str_details.strip() if isinstance(x, _TargetInvalid) else str(x) for x in self.exceptions ) Lib\smtplib.py:537 and 545: optionlist = ' ' + ' '.join(options) Lib\unittest\case.py:1094-1096: diffMsg = '\n' + '\n'.join( difflib.ndiff(pprint.pformat(seq1).splitlines(), pprint.pformat(seq2).splitlines())) Lib\unittest\case.py:1207-1209: diff = ('\n' + '\n'.join(difflib.ndiff( pprint.pformat(d1).splitlines(), pprint.pformat(d2).splitlines()))) SEPARATOR ADDED AT END: Lib\distutils\command\config.py:303: body = "\n".join(body) + "\n" Lib\email\contentmanager.py:145: def embedded_body(lines): return linesep.join(lines) + linesep Lib\email\contentmanager.py:146: def normal_body(lines): return b'\n'.join(lines) + b'\n' Lib\email\policy.py:215: return name + ': ' + self.linesep.join(lines) + self.linesep Lib\email\_header_value_parser.py:2854: return policy.linesep.join(lines) + policy.linesep Lib\site-packages\numpy\distutils\command\config.py:346: body = '\n'.join(body) + "\n" Lib\site-packages\numpy\distutils\command\config.py:407: body = '\n'.join(body) + "\n" Lib\site-packages\oauthlib\oauth2\rfc6749\tokens.py:158: base_string = '\n'.join(base) + '\n' Lib\site-packages\PIL\ImageCms.py:770: return "\r\n\r\n".join(arr) + "\r\n\r\n" Lib\site-packages\pip\_internal\operations\freeze.py:254: return "\n".join(list(self.comments) + [str(req)]) + "\n" Lib\site-packages\pip\_internal\operations\install\legacy.py:54: f.write("\n".join(new_lines) + "\n") Lib\site-packages\pip\_vendor\pyparsing\testing.py:323-331: return ( header1 + header2 + "\n".join( "{:{}d}:{}{}".format(i, lineno_width, line, eol_mark) for i, line in enumerate(s_lines, start=start_line) ) + "\n" ) Lib\site-packages\pycparser\c_generator.py:117: if n.storage: s += ' '.join(n.storage) + ' ' Lib\site-packages\pycparser\c_generator.py:366: if n.funcspec: s = ' '.join(n.funcspec) + ' ' Lib\site-packages\pycparser\c_generator.py:367: if n.storage: s += ' '.join(n.storage) + ' ' Lib\site-packages\pycparser\c_generator.py:382: if n.quals: s += ' '.join(n.quals) + ' ' Lib\site-packages\pycparser\c_generator.py:397: nstr += ' '.join(modifier.dim_quals) + ' ' Lib\site-packages\pycparser\c_generator.py:417: return ' '.join(n.names) + ' ' Lib\site-packages\pythonwin\pywin\framework\scriptutils.py:109: return ".".join(modBits) + "." + fname, newPathReturn Lib\site-packages\reportlab\pdfbase\pdfdoc.py:1118: code = '\n'.join(code)+'\n' Lib\site-packages\reportlab\pdfbase\pdfutils.py:102: f.write('\r\n'.join(code)+'\r\n') Lib\site-packages\reportlab\pdfbase\_can_cmap_data.py:54: src = '\n'.join(buf) + '\n' Lib\site-packages\reportlab\pdfgen\pdfimages.py:203: content = '\n'.join(self.imageData[3:-1]) + '\n' Lib\site-packages\setuptools\command\build_ext.py:221: pkg = '.'.join(ext._full_name.split('.')[:-1] + ['']) Lib\site-packages\setuptools\command\easy_install.py:1056: f.write('\n'.join(locals()[name]) + '\n') Lib\site-packages\setuptools\command\easy_install.py:1606: data = '\n'.join(lines) + '\n' Lib\site-packages\setuptools\command\egg_info.py:672: cmd.write_file("top-level names", filename, '\n'.join(sorted(pkgs)) + '\n') Lib\site-packages\setuptools\command\egg_info.py:683: value = '\n'.join(value) + '\n' Lib\site-packages\setuptools\_distutils\command\config.py:303: body = "\n".join(body) + "\n" Lib\site-packages\twisted\conch\manhole.py:360-362: return (b'\n'.join(self.interpreter.buffer) + b'\n' + b''.join(self.lineBuffer)) Lib\site-packages\twisted\conch\client\knownhosts.py:547-549: hostsFileObj.write( b"\n".join([entry.toString() for entry in self._added]) + b"\n") Lib\site-packages\twisted\conch\ssh\keys.py:1340: return b'\n'.join(lines) + b'\n' Lib\site-packages\twisted\conch\test\test_conch.py:556: expectedResult = '\n'.join(['line #%02d' % (i,) for i in range(60)]) + '\n' Lib\site-packages\twisted\conch\test\test_helper.py:360: self.term.write(b'\n'.join((s1, s2, s3)) + b'\n') Lib\site-packages\twisted\internet\test\test_process.py:769: scriptFile.write(os.linesep.join(sourceLines) + os.linesep) Lib\site-packages\twisted\mail\imap4.py:5713: hdrs = '\r\n'.join(hdrs) + '\r\n' Lib\site-packages\twisted\mail\imap4.py:5952: base = b'.'.join([(x + 1).__bytes__() for x in self.part]) + b'.' + base Lib\site-packages\twisted\mail\test\test_pop3.py:312: self.message = b'\n'.join(self.lines) + b'\n' Lib\site-packages\twisted\mail\test\test_pop3.py:376: output = b'\r\n'.join(client.response) + b'\r\n' Lib\site-packages\twisted\mail\test\test_smtp.py:100: message = b'\n'.join(self.buffer) + b'\n' Lib\site-packages\twisted\mail\test\test_smtp.py:344: message = b'\n'.join(self.buffer) + b'\n' Lib\site-packages\twisted\python\text.py:146: return '\n'.join(lines)+'\n' Lib\site-packages\twisted\test\test_iutils.py:40: scriptFile.write(os.linesep.join(sourceLines) + os.linesep) Lib\site-packages\win32comext\adsi\demos\scp.py:350: description = __doc__ + "\ncommands:\n" + "\n".join(arg_descs) + "\n" Lib\site-packages\wx\py\crust.py:259: self.SetValue('\n'.join(hist) + '\n') Lib\site-packages\wx\py\introspect.py:342: command = terminator.join(pieces[:-1]) + terminator Lib\site-packages\zope\interface\document.py:78: return "\n\n".join(r) + "\n\n" Lib\test\test_nntplib.py:495: lit = "\r\n".join(lit.splitlines()) + "\r\n" Lib\test\test_univnewlines.py:24:DATA_LF = "\n".join(DATA_TEMPLATE) + "\n" Lib\test\test_univnewlines.py:25:DATA_CR = "\r".join(DATA_TEMPLATE) + "\r" Lib\test\test_univnewlines.py:26:DATA_CRLF = "\r\n".join(DATA_TEMPLATE) + "\r\n" Lib\test\test_tools\test_pindent.py:33: return '\n'.join(line.lstrip() for line in data.splitlines()) + '\n' SEPARATOR ADDED AT BOTH ENDS: Lib\pydoc.py:1582: sys.stdout.write('\n' + '\n'.join(lines[r:r+inc]) + '\n') Lib\site-packages\office365\runtime\odata\odata_batch_request.py:129: buffer = eol + eol.join(lines) + eol Lib\test\test_generators.py:1424: print("|" + "|".join(squares) + "|") Lib\test\test_generators.py:1620: print("|" + "|".join(row) + "|")
data:image/s3,"s3://crabby-images/083fb/083fb9fce1476ebe02d0a5d8c76d5547020ebe75" alt=""
Currently you can write term.join(xs) + term if you want 1, 1, 2, 3, ... terminators when xs has 0, 1, 2, 3, ... elements, and term.join([*xs, '']) # or b'' if you want 0, 1, 2, 3, ... terminators, neither of which is prohibitively annoying. What I don't like about the status quo is that I'm never sure that the people who wrote "sep.join(xs) + sep" really want the separator when xs is empty. Unless it's obvious from surrounding code that xs can't be empty, I always worry it's a bug. In your PIL example, arr = [] for elt in (description, cpright): if elt: arr.append(elt) return "\r\n\r\n".join(arr) + "\r\n\r\n" do they really want four newlines when the description and copyright are both empty? I suspect not but I don't know. There's no clarifying comment. In email.contentmanager they call splitlines on some text, then rejoin it with '\n'.join(lines) + '\n'. It looks like the input can be empty since there is some special-case code for that. Do they know that they're increasing the number of newlines in that case? There's no clarifying comment. As for term.join([*xs, '']), while it seems less likely to be a bug, it's not very natural. You aren't adding an extra blank thing at the end, you're just terminating the things you already had. So I think it would be nice to have a way to say explicitly and concisely what you want to happen with an empty list. I suppose this idea will fail for the usual reason (no good syntax), but here's an attempt: term.joinlines(xs) in place of term.join([*xs, '']) term.joinlines(xs) or term in place of term.join(xs) + term The second one is less concise than before, but it doesn't give me that uneasy feeling. Adding a separator before the first element doesn't seem important enough to me to justify the complexity of adding an option for it.
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Wed, Mar 8, 2023 at 4:34 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Compare these two lines: '\n'.join(lines) + '\n' '\n'.join(lines, atEnds=1) The first is not only shorter, it's more clear what it's doing. I'd have to look up everytime to remember whether the value atEnds=1 is doing the right thing. It's just IMHO not valuable enough for the cost. Not every potential optimization is worth including in the core language or the stdlib (if this even is an optimization). --- Bruce
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 09/03/2023 05:25, Bruce Leban wrote:
I don't think your comparison is fair. If the second one were written '\n'.join(lines, 1) it would be shorter. And if it were spelt '\n'.join(lines, 'e') # s for at Start, e for at End, b for Both which I now think is preferable, it would still be shorter and you probably wouldn't need to look it up. And when it comes to examples like return policy.linesep.join(lines) + policy.linesep return policy.linesep.join(lines, 'e') it would save even more characters and very likely be (more of) an optimisation. Ben mentions being able to specify if a terminal separator is wanted when the list is empty. That is an idea which I think is worth considering. How about: 'S' = always add a separator at start 's' = add a separator at start unless the list is empty 'E' = always add a separator at end 'e' = add a separator at end unless the list is empty with these combinations being allowed: 'se, 'SE', 'Se', 'sE' (the last two would have the same effect, but the difference in emphasis might clarify the author's thought). If nothing else, this would push authors into thinking about the empty list case, rather than being sloppy about it as Ben suspects in one or two cases. Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/b550e/b550e6c1b3d3dbec1156543b45d9477e9d4e32fc" alt=""
Then we're back to square one, doing this would be more confusing than just adding the sep to the joined string. In my opinion, this proposal just isn't needed. The equivalent solution is easier to understand and to write, and will lead to way less confusion about what join() actually does.
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
I did say in my first post on the subject that I couldn't decide on the best API. I'm hoping that someone can devise a better one that might gain support. I also listed a large number of use cases in the stdlib (which I chose as a conveniently available codebase. I would expect there to be other use cases in other code). Best wishes Rob Cliffe On 10/03/2023 06:49, constantin@christoph.net wrote:
data:image/s3,"s3://crabby-images/05fbd/05fbdfa1da293282beb61078913d943dc0e5ca1b" alt=""
I think `'\n'.join(lines, 1)` & `'\n'.join(lines, 'e')` are worse than `'\n'.join(lines, at_end=True)` or others, it's much more complicated to understand what will be produced. Le ven. 10 mars 2023 à 00:47, Rob Cliffe via Python-ideas < python-ideas@python.org> a écrit :
-- Antoine Rozo
data:image/s3,"s3://crabby-images/083fb/083fb9fce1476ebe02d0a5d8c76d5547020ebe75" alt=""
Currently you can write term.join(xs) + term if you want 1, 1, 2, 3, ... terminators when xs has 0, 1, 2, 3, ... elements, and term.join([*xs, '']) # or b'' if you want 0, 1, 2, 3, ... terminators, neither of which is prohibitively annoying. What I don't like about the status quo is that I'm never sure that the people who wrote "sep.join(xs) + sep" really want the separator when xs is empty. Unless it's obvious from surrounding code that xs can't be empty, I always worry it's a bug. In your PIL example, arr = [] for elt in (description, cpright): if elt: arr.append(elt) return "\r\n\r\n".join(arr) + "\r\n\r\n" do they really want four newlines when the description and copyright are both empty? I suspect not but I don't know. There's no clarifying comment. In email.contentmanager they call splitlines on some text, then rejoin it with '\n'.join(lines) + '\n'. It looks like the input can be empty since there is some special-case code for that. Do they know that they're increasing the number of newlines in that case? There's no clarifying comment. As for term.join([*xs, '']), while it seems less likely to be a bug, it's not very natural. You aren't adding an extra blank thing at the end, you're just terminating the things you already had. So I think it would be nice to have a way to say explicitly and concisely what you want to happen with an empty list. I suppose this idea will fail for the usual reason (no good syntax), but here's an attempt: term.joinlines(xs) in place of term.join([*xs, '']) term.joinlines(xs) or term in place of term.join(xs) + term The second one is less concise than before, but it doesn't give me that uneasy feeling. Adding a separator before the first element doesn't seem important enough to me to justify the complexity of adding an option for it.
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Wed, Mar 8, 2023 at 4:34 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Compare these two lines: '\n'.join(lines) + '\n' '\n'.join(lines, atEnds=1) The first is not only shorter, it's more clear what it's doing. I'd have to look up everytime to remember whether the value atEnds=1 is doing the right thing. It's just IMHO not valuable enough for the cost. Not every potential optimization is worth including in the core language or the stdlib (if this even is an optimization). --- Bruce
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 09/03/2023 05:25, Bruce Leban wrote:
I don't think your comparison is fair. If the second one were written '\n'.join(lines, 1) it would be shorter. And if it were spelt '\n'.join(lines, 'e') # s for at Start, e for at End, b for Both which I now think is preferable, it would still be shorter and you probably wouldn't need to look it up. And when it comes to examples like return policy.linesep.join(lines) + policy.linesep return policy.linesep.join(lines, 'e') it would save even more characters and very likely be (more of) an optimisation. Ben mentions being able to specify if a terminal separator is wanted when the list is empty. That is an idea which I think is worth considering. How about: 'S' = always add a separator at start 's' = add a separator at start unless the list is empty 'E' = always add a separator at end 'e' = add a separator at end unless the list is empty with these combinations being allowed: 'se, 'SE', 'Se', 'sE' (the last two would have the same effect, but the difference in emphasis might clarify the author's thought). If nothing else, this would push authors into thinking about the empty list case, rather than being sloppy about it as Ben suspects in one or two cases. Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/b550e/b550e6c1b3d3dbec1156543b45d9477e9d4e32fc" alt=""
Then we're back to square one, doing this would be more confusing than just adding the sep to the joined string. In my opinion, this proposal just isn't needed. The equivalent solution is easier to understand and to write, and will lead to way less confusion about what join() actually does.
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
I did say in my first post on the subject that I couldn't decide on the best API. I'm hoping that someone can devise a better one that might gain support. I also listed a large number of use cases in the stdlib (which I chose as a conveniently available codebase. I would expect there to be other use cases in other code). Best wishes Rob Cliffe On 10/03/2023 06:49, constantin@christoph.net wrote:
data:image/s3,"s3://crabby-images/05fbd/05fbdfa1da293282beb61078913d943dc0e5ca1b" alt=""
I think `'\n'.join(lines, 1)` & `'\n'.join(lines, 'e')` are worse than `'\n'.join(lines, at_end=True)` or others, it's much more complicated to understand what will be produced. Le ven. 10 mars 2023 à 00:47, Rob Cliffe via Python-ideas < python-ideas@python.org> a écrit :
-- Antoine Rozo
participants (6)
-
Antoine Rozo
-
Barry
-
Ben Rudiak-Gould
-
Bruce Leban
-
constantin@christoph.net
-
Rob Cliffe