Mailman 3 join() could add separators at start or end - Python-ideas

March 8, 2023

      Having had my last proposal shot down in flames, up I bob with another. 😁
It seems to me that it would be useful to be able to make the str.join() 
function put separators, not only between the items of its operand, but 
also optionally at the beginning or end.
E.g. '|'.join(('Spam', 'Ham', 'Eggs')) returns
     'Spam|Ham|Eggs'
but it might be useful to make it return one of
     '|Spam|Ham|Eggs'
     'Spam|Ham|Eggs|'
     '|Spam|Ham|Eggs|'
Again, I suggest that this apply to byte strings as well as strings.
Going through the 3.8.3 stdlib I have found
     24 examples where the separator needs to be added at the beginning
     52 where the separator needs to be added at the end
      4 where the separator needs to be added at the both ends
I list these examples below.  Apologies if there are any mistakes.

While guessing is no substitute for measurement, it seems plausible that 
using this feature
where appropriate would increase runtime performance by avoiding 1 (or 
2) calls of str.__add__.
This is perhaps more relevant when the separator is not a short constant 
string,
as in this example:
     Lib\email\_header_value_parser.py:2854:    return 
policy.linesep.join(lines) + policy.linesep
Note also this example:
     Lib\site-packages\setuptools\command\build_ext.py:221: pkg = 
'.'.join(ext._full_name.split('.')[:-1] + [''])
where the author has used the unintuitive device of appending an empty 
string
to a list to force join() to add an extra final dot, thereby avoiding 1 
call of str.__add__
at the cost of 1 call of list.append.

What I cannot decide is what the best API would be.
str.join() currently takes only 1 parameter, so it would be possible to 
add an extra parameter or two.
One scheme would be to have an atEnds parameter which could take values 
such as
     0=default behaviour  1=add sep at start  2=add sep at end  3=add 
sep at both ends
or
     's'=add sep at start  'e'=add sep at end  'b'=add sep at both ends  
(some) other=default behaviour
Another would be to have 2 parameters, atStart and atEnd, which would 
both default to False or 0.  E.g.
     '|'.join(('Spam', 'Ham', 'Eggs'), 1)    == '|Spam|Ham|Eggs'
     '|'.join(('Spam', 'Ham', 'Eggs'), 0, 1) == 'Spam|Ham|Eggs|'
Neither scheme results in particularly transparent usage, though no 
worse than
     s.splitlines(True) # What on earth is this parameter???

Corner case:
What if join() is passed an empty sequence?  This is debatable,
but I think it should return the separator if requested to add it
at the beginning or end, and double it up if both are requested.
This would preserve identities such as
     sep.join(seq, <PleaseAddSeparatorAtBeginning>) == sep + sep.join(seq)

Best wishes
Rob Cliffe

EXAMPLES WHERE SEPARATOR ADDED AT START:

Lib\http\server.py:933:    splitpath = ('/' + '/'.join(head_parts), 
tail_part)
Lib\site-packages\numpy\ctypeslib.py:333:        name += "_"+"_".join(flags)
Lib\site-packages\numpy\testing\_private\utils.py:842: err_msg += '\n' + 
'\n'.join(remarks)
Lib\site-packages\pip\_vendor\pyparsing\core.py:2092-2095:
             out = [
                 "\n" + "\n".join(comments) if comments else "",
                 pyparsing_test.with_line_numbers(t) if 
with_line_numbers else t,
             ]
Lib\site-packages\pip\_vendor\requests\status_codes.py:121-125:
     __doc__ = (
         __doc__ + "\n" + "\n".join(doc(code) for code in sorted(_codes))
         if __doc__ is not None
         else None
     )
Lib\site-packages\reportlab\lib\utils.py:1093: self._writeln(' '+' 
'.join(A.__self__))
Lib\site-packages\reportlab\platypus\flowables.py:708:        L = 
"\n"+"\n".join(L)
Lib\site-packages\twisted\mail\smtp.py:1647: r.append(c + b' ' +  b' 
'.join(v))
Lib\site-packages\twisted\protocols\ftp.py:1203:        return 
(PWD_REPLY, '/' + '/'.join(self.workingDirectory))
Lib\site-packages\twisted\runner\procmon.py:424-426:
         return ('<' + self.__class__.__name__ + ' '
                 + ' '.join(l)
                 + '>')
Lib\site-packages\twisted\web\rewrite.py:34:        request.path = 
'/'+'/'.join(request.prepath+request.postpath)
Lib\site-packages\twisted\web\rewrite.py:51:            request.path = 
'/'+'/'.join(request.prepath+request.postpath)
Lib\site-packages\twisted\web\twcgi.py:78:        scriptName = b"/" + 
b"/".join(request.prepath)
Lib\site-packages\twisted\web\twcgi.py:95: env["PATH_INFO"] = "/" + 
"/".join(pp)
Lib\site-packages\twisted\web\vhost.py:115:        request.path = b'/' + 
b'/'.join(request.postpath)
Lib\site-packages\twisted\web\wsgi.py:283:            scriptName = b'/' 
+ b'/'.join(request.prepath)
Lib\site-packages\twisted\web\wsgi.py:288:            pathInfo = b'/' + 
b'/'.join(request.postpath)
Lib\site-packages\twisted\web\test\test_wsgi.py:272:        uri = '/' + 
'/'.join([urlquote(seg, safe) for seg in requestSegments])
Lib\site-packages\wx\py\magic.py:55:            command = 
'sx("'+aliasDict[c[0]]+' '+' '.join(c[1:])+'")'
Lib\site-packages\zope\interface\exceptions.py:257-260:
         return '\n    ' + '\n    '.join(
             x._str_details.strip() if isinstance(x, _TargetInvalid) 
else str(x)
             for x in self.exceptions
         )
Lib\smtplib.py:537 and 545:    optionlist = ' ' + ' '.join(options)
Lib\unittest\case.py:1094-1096:
         diffMsg = '\n' + '\n'.join(
             difflib.ndiff(pprint.pformat(seq1).splitlines(),
                           pprint.pformat(seq2).splitlines()))
Lib\unittest\case.py:1207-1209:
             diff = ('\n' + '\n'.join(difflib.ndiff(
                            pprint.pformat(d1).splitlines(),
                            pprint.pformat(d2).splitlines())))

SEPARATOR ADDED AT END:

Lib\distutils\command\config.py:303:        body = "\n".join(body) + "\n"
Lib\email\contentmanager.py:145:    def embedded_body(lines): return 
linesep.join(lines) + linesep
Lib\email\contentmanager.py:146:    def normal_body(lines): return 
b'\n'.join(lines) + b'\n'
Lib\email\policy.py:215:        return name + ': ' + 
self.linesep.join(lines) + self.linesep
Lib\email\_header_value_parser.py:2854:    return 
policy.linesep.join(lines) + policy.linesep
Lib\site-packages\numpy\distutils\command\config.py:346:        body = 
'\n'.join(body) + "\n"
Lib\site-packages\numpy\distutils\command\config.py:407:        body = 
'\n'.join(body) + "\n"
Lib\site-packages\oauthlib\oauth2\rfc6749\tokens.py:158: base_string = 
'\n'.join(base) + '\n'
Lib\site-packages\PIL\ImageCms.py:770:        return 
"\r\n\r\n".join(arr) + "\r\n\r\n"
Lib\site-packages\pip\_internal\operations\freeze.py:254: return 
"\n".join(list(self.comments) + [str(req)]) + "\n"
Lib\site-packages\pip\_internal\operations\install\legacy.py:54: 
f.write("\n".join(new_lines) + "\n")
Lib\site-packages\pip\_vendor\pyparsing\testing.py:323-331:
         return (
             header1
             + header2
             + "\n".join(
                 "{:{}d}:{}{}".format(i, lineno_width, line, eol_mark)
                 for i, line in enumerate(s_lines, start=start_line)
             )
             + "\n"
         )
Lib\site-packages\pycparser\c_generator.py:117:        if n.storage: s 
+= ' '.join(n.storage) + ' '
Lib\site-packages\pycparser\c_generator.py:366:        if n.funcspec: s 
= ' '.join(n.funcspec) + ' '
Lib\site-packages\pycparser\c_generator.py:367:        if n.storage: s 
+= ' '.join(n.storage) + ' '
Lib\site-packages\pycparser\c_generator.py:382:            if n.quals: s 
+= ' '.join(n.quals) + ' '
Lib\site-packages\pycparser\c_generator.py:397: nstr += ' 
'.join(modifier.dim_quals) + ' '
Lib\site-packages\pycparser\c_generator.py:417:            return ' 
'.join(n.names) + ' '
Lib\site-packages\pythonwin\pywin\framework\scriptutils.py:109: return 
".".join(modBits) + "." + fname, newPathReturn
Lib\site-packages\reportlab\pdfbase\pdfdoc.py:1118:            code = 
'\n'.join(code)+'\n'
Lib\site-packages\reportlab\pdfbase\pdfutils.py:102: 
f.write('\r\n'.join(code)+'\r\n')
Lib\site-packages\reportlab\pdfbase\_can_cmap_data.py:54:    src = 
'\n'.join(buf) + '\n'
Lib\site-packages\reportlab\pdfgen\pdfimages.py:203:        content = 
'\n'.join(self.imageData[3:-1]) + '\n'
Lib\site-packages\setuptools\command\build_ext.py:221:        pkg = 
'.'.join(ext._full_name.split('.')[:-1] + [''])
Lib\site-packages\setuptools\command\easy_install.py:1056: 
f.write('\n'.join(locals()[name]) + '\n')
Lib\site-packages\setuptools\command\easy_install.py:1606: data = 
'\n'.join(lines) + '\n'
Lib\site-packages\setuptools\command\egg_info.py:672: 
cmd.write_file("top-level names", filename, '\n'.join(sorted(pkgs)) + '\n')
Lib\site-packages\setuptools\command\egg_info.py:683:        value = 
'\n'.join(value) + '\n'
Lib\site-packages\setuptools\_distutils\command\config.py:303: body = 
"\n".join(body) + "\n"
Lib\site-packages\twisted\conch\manhole.py:360-362:
         return (b'\n'.join(self.interpreter.buffer) +
                 b'\n' +
                 b''.join(self.lineBuffer))
Lib\site-packages\twisted\conch\client\knownhosts.py:547-549:
                 hostsFileObj.write(
                     b"\n".join([entry.toString() for entry in 
self._added]) +
                     b"\n")
Lib\site-packages\twisted\conch\ssh\keys.py:1340:        return 
b'\n'.join(lines) + b'\n'
Lib\site-packages\twisted\conch\test\test_conch.py:556: expectedResult = 
'\n'.join(['line #%02d' % (i,) for i in range(60)]) + '\n'
Lib\site-packages\twisted\conch\test\test_helper.py:360: 
self.term.write(b'\n'.join((s1, s2, s3)) + b'\n')
Lib\site-packages\twisted\internet\test\test_process.py:769: 
scriptFile.write(os.linesep.join(sourceLines) + os.linesep)
Lib\site-packages\twisted\mail\imap4.py:5713:    hdrs = 
'\r\n'.join(hdrs) + '\r\n'
Lib\site-packages\twisted\mail\imap4.py:5952:                base = 
b'.'.join([(x + 1).__bytes__() for x in self.part]) + b'.' + base
Lib\site-packages\twisted\mail\test\test_pop3.py:312: self.message = 
b'\n'.join(self.lines) + b'\n'
Lib\site-packages\twisted\mail\test\test_pop3.py:376: output = 
b'\r\n'.join(client.response) + b'\r\n'
Lib\site-packages\twisted\mail\test\test_smtp.py:100:        message = 
b'\n'.join(self.buffer) + b'\n'
Lib\site-packages\twisted\mail\test\test_smtp.py:344:        message = 
b'\n'.join(self.buffer) + b'\n'
Lib\site-packages\twisted\python\text.py:146:    return 
'\n'.join(lines)+'\n'
Lib\site-packages\twisted\test\test_iutils.py:40: 
scriptFile.write(os.linesep.join(sourceLines) + os.linesep)
Lib\site-packages\win32comext\adsi\demos\scp.py:350:    description = 
__doc__ + "\ncommands:\n" + "\n".join(arg_descs) + "\n"
Lib\site-packages\wx\py\crust.py:259: self.SetValue('\n'.join(hist) + '\n')
Lib\site-packages\wx\py\introspect.py:342:            command = 
terminator.join(pieces[:-1]) + terminator
Lib\site-packages\zope\interface\document.py:78:    return 
"\n\n".join(r) + "\n\n"
Lib\test\test_nntplib.py:495:        lit = "\r\n".join(lit.splitlines()) 
+ "\r\n"
Lib\test\test_univnewlines.py:24:DATA_LF = "\n".join(DATA_TEMPLATE) + "\n"
Lib\test\test_univnewlines.py:25:DATA_CR = "\r".join(DATA_TEMPLATE) + "\r"
Lib\test\test_univnewlines.py:26:DATA_CRLF = "\r\n".join(DATA_TEMPLATE) 
+ "\r\n"
Lib\test\test_tools\test_pindent.py:33:        return 
'\n'.join(line.lstrip() for line in data.splitlines()) + '\n'

SEPARATOR ADDED AT BOTH ENDS:

Lib\pydoc.py:1582:            sys.stdout.write('\n' + 
'\n'.join(lines[r:r+inc]) + '\n')
Lib\site-packages\office365\runtime\odata\odata_batch_request.py:129: 
buffer = eol + eol.join(lines) + eol
Lib\test\test_generators.py:1424:            print("|" + 
"|".join(squares) + "|")
Lib\test\test_generators.py:1620:            print("|" + "|".join(row) + 
"|")

join() could add separators at start or end

Rob Cliffe

Ben Rudiak-Gould

Bruce Leban

Rob Cliffe

constantin＠christoph.net

Rob Cliffe

Antoine Rozo

Barry

Ben Rudiak-Gould

Bruce Leban

Rob Cliffe

constantin＠christoph.net

Rob Cliffe

Antoine Rozo

Barry

tags

participants (6)