Having had my last proposal shot down in flames, up I bob with another. 😁
It seems to me that it would be useful to be able to make the str.join()
function put separators, not only between the items of its operand, but
also optionally at the beginning or end.
E.g. '|'.join(('Spam', 'Ham', 'Eggs')) returns
'Spam|Ham|Eggs'
but it might be useful to make it return one of
'|Spam|Ham|Eggs'
'Spam|Ham|Eggs|'
'|Spam|Ham|Eggs|'
Again, I suggest that this apply to byte strings as well as strings.
Going through the 3.8.3 stdlib I have found
24 examples where the separator needs to be added at the beginning
52 where the separator needs to be added at the end
4 where the separator needs to be added at the both ends
I list these examples below. Apologies if there are any mistakes.
While guessing is no substitute for measurement, it seems plausible that
using this feature
where appropriate would increase runtime performance by avoiding 1 (or
2) calls of str.__add__.
This is perhaps more relevant when the separator is not a short constant
string,
as in this example:
Lib\email\_header_value_parser.py:2854: return
policy.linesep.join(lines) + policy.linesep
Note also this example:
Lib\site-packages\setuptools\command\build_ext.py:221: pkg =
'.'.join(ext._full_name.split('.')[:-1] + [''])
where the author has used the unintuitive device of appending an empty
string
to a list to force join() to add an extra final dot, thereby avoiding 1
call of str.__add__
at the cost of 1 call of list.append.
What I cannot decide is what the best API would be.
str.join() currently takes only 1 parameter, so it would be possible to
add an extra parameter or two.
One scheme would be to have an atEnds parameter which could take values
such as
0=default behaviour 1=add sep at start 2=add sep at end 3=add
sep at both ends
or
's'=add sep at start 'e'=add sep at end 'b'=add sep at both ends
(some) other=default behaviour
Another would be to have 2 parameters, atStart and atEnd, which would
both default to False or 0. E.g.
'|'.join(('Spam', 'Ham', 'Eggs'), 1) == '|Spam|Ham|Eggs'
'|'.join(('Spam', 'Ham', 'Eggs'), 0, 1) == 'Spam|Ham|Eggs|'
Neither scheme results in particularly transparent usage, though no
worse than
s.splitlines(True) # What on earth is this parameter???
Corner case:
What if join() is passed an empty sequence? This is debatable,
but I think it should return the separator if requested to add it
at the beginning or end, and double it up if both are requested.
This would preserve identities such as
sep.join(seq, <PleaseAddSeparatorAtBeginning>) == sep + sep.join(seq)
Best wishes
Rob Cliffe
EXAMPLES WHERE SEPARATOR ADDED AT START:
Lib\http\server.py:933: splitpath = ('/' + '/'.join(head_parts),
tail_part)
Lib\site-packages\numpy\ctypeslib.py:333: name += "_"+"_".join(flags)
Lib\site-packages\numpy\testing\_private\utils.py:842: err_msg += '\n' +
'\n'.join(remarks)
Lib\site-packages\pip\_vendor\pyparsing\core.py:2092-2095:
out = [
"\n" + "\n".join(comments) if comments else "",
pyparsing_test.with_line_numbers(t) if
with_line_numbers else t,
]
Lib\site-packages\pip\_vendor\requests\status_codes.py:121-125:
__doc__ = (
__doc__ + "\n" + "\n".join(doc(code) for code in sorted(_codes))
if __doc__ is not None
else None
)
Lib\site-packages\reportlab\lib\utils.py:1093: self._writeln(' '+'
'.join(A.__self__))
Lib\site-packages\reportlab\platypus\flowables.py:708: L =
"\n"+"\n".join(L)
Lib\site-packages\twisted\mail\smtp.py:1647: r.append(c + b' ' + b'
'.join(v))
Lib\site-packages\twisted\protocols\ftp.py:1203: return
(PWD_REPLY, '/' + '/'.join(self.workingDirectory))
Lib\site-packages\twisted\runner\procmon.py:424-426:
return ('<' + self.__class__.__name__ + ' '
+ ' '.join(l)
+ '>')
Lib\site-packages\twisted\web\rewrite.py:34: request.path =
'/'+'/'.join(request.prepath+request.postpath)
Lib\site-packages\twisted\web\rewrite.py:51: request.path =
'/'+'/'.join(request.prepath+request.postpath)
Lib\site-packages\twisted\web\twcgi.py:78: scriptName = b"/" +
b"/".join(request.prepath)
Lib\site-packages\twisted\web\twcgi.py:95: env["PATH_INFO"] = "/" +
"/".join(pp)
Lib\site-packages\twisted\web\vhost.py:115: request.path = b'/' +
b'/'.join(request.postpath)
Lib\site-packages\twisted\web\wsgi.py:283: scriptName = b'/'
+ b'/'.join(request.prepath)
Lib\site-packages\twisted\web\wsgi.py:288: pathInfo = b'/' +
b'/'.join(request.postpath)
Lib\site-packages\twisted\web\test\test_wsgi.py:272: uri = '/' +
'/'.join([urlquote(seg, safe) for seg in requestSegments])
Lib\site-packages\wx\py\magic.py:55: command =
'sx("'+aliasDict[c[0]]+' '+' '.join(c[1:])+'")'
Lib\site-packages\zope\interface\exceptions.py:257-260:
return '\n ' + '\n '.join(
x._str_details.strip() if isinstance(x, _TargetInvalid)
else str(x)
for x in self.exceptions
)
Lib\smtplib.py:537 and 545: optionlist = ' ' + ' '.join(options)
Lib\unittest\case.py:1094-1096:
diffMsg = '\n' + '\n'.join(
difflib.ndiff(pprint.pformat(seq1).splitlines(),
pprint.pformat(seq2).splitlines()))
Lib\unittest\case.py:1207-1209:
diff = ('\n' + '\n'.join(difflib.ndiff(
pprint.pformat(d1).splitlines(),
pprint.pformat(d2).splitlines())))
SEPARATOR ADDED AT END:
Lib\distutils\command\config.py:303: body = "\n".join(body) + "\n"
Lib\email\contentmanager.py:145: def embedded_body(lines): return
linesep.join(lines) + linesep
Lib\email\contentmanager.py:146: def normal_body(lines): return
b'\n'.join(lines) + b'\n'
Lib\email\policy.py:215: return name + ': ' +
self.linesep.join(lines) + self.linesep
Lib\email\_header_value_parser.py:2854: return
policy.linesep.join(lines) + policy.linesep
Lib\site-packages\numpy\distutils\command\config.py:346: body =
'\n'.join(body) + "\n"
Lib\site-packages\numpy\distutils\command\config.py:407: body =
'\n'.join(body) + "\n"
Lib\site-packages\oauthlib\oauth2\rfc6749\tokens.py:158: base_string =
'\n'.join(base) + '\n'
Lib\site-packages\PIL\ImageCms.py:770: return
"\r\n\r\n".join(arr) + "\r\n\r\n"
Lib\site-packages\pip\_internal\operations\freeze.py:254: return
"\n".join(list(self.comments) + [str(req)]) + "\n"
Lib\site-packages\pip\_internal\operations\install\legacy.py:54:
f.write("\n".join(new_lines) + "\n")
Lib\site-packages\pip\_vendor\pyparsing\testing.py:323-331:
return (
header1
+ header2
+ "\n".join(
"{:{}d}:{}{}".format(i, lineno_width, line, eol_mark)
for i, line in enumerate(s_lines, start=start_line)
)
+ "\n"
)
Lib\site-packages\pycparser\c_generator.py:117: if n.storage: s
+= ' '.join(n.storage) + ' '
Lib\site-packages\pycparser\c_generator.py:366: if n.funcspec: s
= ' '.join(n.funcspec) + ' '
Lib\site-packages\pycparser\c_generator.py:367: if n.storage: s
+= ' '.join(n.storage) + ' '
Lib\site-packages\pycparser\c_generator.py:382: if n.quals: s
+= ' '.join(n.quals) + ' '
Lib\site-packages\pycparser\c_generator.py:397: nstr += '
'.join(modifier.dim_quals) + ' '
Lib\site-packages\pycparser\c_generator.py:417: return '
'.join(n.names) + ' '
Lib\site-packages\pythonwin\pywin\framework\scriptutils.py:109: return
".".join(modBits) + "." + fname, newPathReturn
Lib\site-packages\reportlab\pdfbase\pdfdoc.py:1118: code =
'\n'.join(code)+'\n'
Lib\site-packages\reportlab\pdfbase\pdfutils.py:102:
f.write('\r\n'.join(code)+'\r\n')
Lib\site-packages\reportlab\pdfbase\_can_cmap_data.py:54: src =
'\n'.join(buf) + '\n'
Lib\site-packages\reportlab\pdfgen\pdfimages.py:203: content =
'\n'.join(self.imageData[3:-1]) + '\n'
Lib\site-packages\setuptools\command\build_ext.py:221: pkg =
'.'.join(ext._full_name.split('.')[:-1] + [''])
Lib\site-packages\setuptools\command\easy_install.py:1056:
f.write('\n'.join(locals()[name]) + '\n')
Lib\site-packages\setuptools\command\easy_install.py:1606: data =
'\n'.join(lines) + '\n'
Lib\site-packages\setuptools\command\egg_info.py:672:
cmd.write_file("top-level names", filename, '\n'.join(sorted(pkgs)) + '\n')
Lib\site-packages\setuptools\command\egg_info.py:683: value =
'\n'.join(value) + '\n'
Lib\site-packages\setuptools\_distutils\command\config.py:303: body =
"\n".join(body) + "\n"
Lib\site-packages\twisted\conch\manhole.py:360-362:
return (b'\n'.join(self.interpreter.buffer) +
b'\n' +
b''.join(self.lineBuffer))
Lib\site-packages\twisted\conch\client\knownhosts.py:547-549:
hostsFileObj.write(
b"\n".join([entry.toString() for entry in
self._added]) +
b"\n")
Lib\site-packages\twisted\conch\ssh\keys.py:1340: return
b'\n'.join(lines) + b'\n'
Lib\site-packages\twisted\conch\test\test_conch.py:556: expectedResult =
'\n'.join(['line #%02d' % (i,) for i in range(60)]) + '\n'
Lib\site-packages\twisted\conch\test\test_helper.py:360:
self.term.write(b'\n'.join((s1, s2, s3)) + b'\n')
Lib\site-packages\twisted\internet\test\test_process.py:769:
scriptFile.write(os.linesep.join(sourceLines) + os.linesep)
Lib\site-packages\twisted\mail\imap4.py:5713: hdrs =
'\r\n'.join(hdrs) + '\r\n'
Lib\site-packages\twisted\mail\imap4.py:5952: base =
b'.'.join([(x + 1).__bytes__() for x in self.part]) + b'.' + base
Lib\site-packages\twisted\mail\test\test_pop3.py:312: self.message =
b'\n'.join(self.lines) + b'\n'
Lib\site-packages\twisted\mail\test\test_pop3.py:376: output =
b'\r\n'.join(client.response) + b'\r\n'
Lib\site-packages\twisted\mail\test\test_smtp.py:100: message =
b'\n'.join(self.buffer) + b'\n'
Lib\site-packages\twisted\mail\test\test_smtp.py:344: message =
b'\n'.join(self.buffer) + b'\n'
Lib\site-packages\twisted\python\text.py:146: return
'\n'.join(lines)+'\n'
Lib\site-packages\twisted\test\test_iutils.py:40:
scriptFile.write(os.linesep.join(sourceLines) + os.linesep)
Lib\site-packages\win32comext\adsi\demos\scp.py:350: description =
__doc__ + "\ncommands:\n" + "\n".join(arg_descs) + "\n"
Lib\site-packages\wx\py\crust.py:259: self.SetValue('\n'.join(hist) + '\n')
Lib\site-packages\wx\py\introspect.py:342: command =
terminator.join(pieces[:-1]) + terminator
Lib\site-packages\zope\interface\document.py:78: return
"\n\n".join(r) + "\n\n"
Lib\test\test_nntplib.py:495: lit = "\r\n".join(lit.splitlines())
+ "\r\n"
Lib\test\test_univnewlines.py:24:DATA_LF = "\n".join(DATA_TEMPLATE) + "\n"
Lib\test\test_univnewlines.py:25:DATA_CR = "\r".join(DATA_TEMPLATE) + "\r"
Lib\test\test_univnewlines.py:26:DATA_CRLF = "\r\n".join(DATA_TEMPLATE)
+ "\r\n"
Lib\test\test_tools\test_pindent.py:33: return
'\n'.join(line.lstrip() for line in data.splitlines()) + '\n'
SEPARATOR ADDED AT BOTH ENDS:
Lib\pydoc.py:1582: sys.stdout.write('\n' +
'\n'.join(lines[r:r+inc]) + '\n')
Lib\site-packages\office365\runtime\odata\odata_batch_request.py:129:
buffer = eol + eol.join(lines) + eol
Lib\test\test_generators.py:1424: print("|" +
"|".join(squares) + "|")
Lib\test\test_generators.py:1620: print("|" + "|".join(row) +
"|")