Ampersand operator for strings
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Tl;dr: Join strings together with exactly one space between non-blank text where they join. I propose a meaning for s1 & s2 where s1 and s2 are strings. Namely, that it should be equivalent to s1.rstrip() + (' ' if (s1.strip() and s2.strip()) else '') + s2.lstrip() Informally, this will join the two strings together with exactly one space between the last non-blank text in s1 and the first non-blank text in s2. Example: " bar " & " foo " == " bar foo " This operator is associative, so there is no ambiguity in expressions such as s1 & s2 & s3 There *is* a possible ambiguity in expressions such as s1 & s2 + s3 where the relative precedence of `&` and `+` matters when s2 consists solely of white space. E.g. " A " & " " + " C" would evaluate to " A C" not to " A C" because `+` has a higher precedence than '&'. Utility: In decades of experience with another language which had such an operator (spelt differently) I have frequently found it useful for constructing human-readable output (e.g. log output, debug/error messages, on-screen labels). Cognitive burden: This would of course be one more thing to learn. But I suggest that it is fairly intuitive that s1 + s2 s1 & s2 both suggest that two strings are being combined in some way. Bars to overcome: This change would require no fundamental change to Python; just adding an `__and__ function to the str class. Backward compatibility: Given that `str1 & str2` currently raises TypeError, this change would be close to 100% backward-compatible. Alternative meanings: As far as I know nobody has ever suggested an alternative meaning for `&` on strings. Bikeshedding: (1) I don't think it is important for the utility of this change whether `&` strips off all whitespace, or just spaces. I think it is better if it strips off all whitespace, so that it can be understood as working similarly to strip(). (2) The definition could be simplified to s1.rstrip() + ' ' + s2.lstrip() (leave an initial/final space when one string is whitespace only). Again the operator would be associative. Again I don't think this is important. Rob Cliffe
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Is it really that much longer to write `f"{s1} {s2}"` when you want that? Maybe a couple characters more total, but once you are in an f-string, you can also do a zillion other things at the same time. On Sun, Mar 5, 2023 at 10:42 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/3d630/3d63057c3cfab4559cab902d5f910729055bb455" alt=""
On Mon, Mar 6, 2023 at 12:51 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Is it really that much longer to write `f"{s1} {s2}"` when you want that?
As for being that much longer: yes it is. The more important factor is, I think, the increase in complexity + readabiity for default strings is worth it in this case. One nice thing to think about might be how to make string subclassing to be more useful - so this kind of thing could be done for whoever needs it in one more idiomatic project. The drawback is how cumbersome it is to instantiate a string subclass compared to a string literal. (I just got an idea, but it would be too offtopic here - if I think it is worth, I will post it in a new thread later)
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
On 02.03.2023 18:27, Rob Cliffe via Python-ideas wrote:
I don't find these semantics particularly intuitive. Python already has the + operator for concatenating strings and this doesn't apply any stripping. If you're emphasizing on joining words with single space delimiters, then the usual: def join_words(list_of_words) return ' '.join([x.strip() for x in list_of_words]) works much better. You can also apply this recursively, if needed, or add support for list_of_phrases (first splitting these into a list_of_words). The advantage of join_words() is that it's easy to understand and applies stripping in a concise way. I use such helpers all the time. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 06 2023)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Mar 06, 2023 at 10:33:26AM +0100, Marc-Andre Lemburg wrote:
def join_words(list_of_words) return ' '.join([x.strip() for x in list_of_words])
That's not Rob's suggestion either. Rob's suggestion is an operator which concats two substrings with exactly one space between them, without stripping leading or trailing whitespace of the result. Examples: a = "\nHeading:" b = "Result\n\n" a & b would give "\nHeading: Result\n\n" s = " my hovercraft\n" t = " is full of eels\n" s & t would give " my hovercraft is full of eels\n" I find the concept is very easy to understand: "concat with exactly one space between the operands". But I must admit I'm struggling to think of cases where I would use it. I like the look of the & operator for concatenation, so I want to like this proposal. But I think I will need to see real world code to understand when it would be useful. -- Steve
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
On 06.03.2023 11:33, Steven D'Aprano wrote:
I know, but as I mentioned, I use the above often, whereas I find Rob's definition not very intuitive or useful.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 06 2023)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Steven D'Aprano writes:
I have to second that motion. Pretty much any time I'm constructing lines containing variable text, each string value arrives stripped and I far more often want padding of variable width values rather than space compression. I admit that I use M-SPC (aka just-one-space) lot in Emacsen, but I can't recall wanting it in a program in any language.
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Rob Cliffe writes:
Perhaps where you're not laying out a table,
I'm an economist, laying out tables is what I do. :-) Getting serious:
str '+' has long been quite rare in my coding. str concatenation is almost never in an inner loop, or slighly more complex formatting is the point. f-strings and .format save you the type conversion to str. So I don't find that occasional saving at all interesting. A vanishingly small number of my str constructions involve only strs with trivial formatting. What interests me about the proposal is the space-collapsing part, which a naive f-string would do incorrectly if, say, s2 == '' or s3 == '\t\t\ttabs to the left of me'. But where does this space-surrounded str data come from? Steve
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 07/03/2023 07:26, Stephen J. Turnbull wrote: purity, or a hideous symmetry breaking? I didn't highlight it before, so let me take the opportunity now: One of the examples I listed shows another use case: constructing an operating system command and ensuring that the parameters are separated by (at least) one space. Bear with me while I repeat it: Lib\site-packages\numpy\distutils\system_info.py:2677: cmd = config_exe + ' ' + self.append_config_exe + ' ' + option cmd = config_exe & self.append_config_exe & option Best wishes Rob Cliffe
Steve
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Bikeshedding: 1) I forgot to mention whether the '&' operator should apply to byte strings as well as to strings. I propose that it should (some supporting examples below). 2) Joao suggested spelling the operator '|'. But for me: '|' suggests "or" while '&' suggests "and" and is a more intuitive choice for some kind of concatenation. On 06/03/2023 10:33, Steven D'Aprano wrote:
Quite right, Steven. (By the way, thanks for reading my suggestion carefully, as you obviously did.) Below I list some examples from the 3.8.3 stdlib and how (I think) they could be rewritten. (Disclosure: I have selected the examples I found which I think are most convincing. Then again, I may have missed some.) I say "I think" because to be 100% sure (rather than 90%+ sure) in all cases would sometimes need a thorough analysis of the values that parts of an expression could (reasonably) take: specifically, whether some could be an empty string or all whitespace and what would happen if they were. Often in practice these cases can be disregarded or are unimportant. And of course the code author should know all about these cases. Arguably, good examples are *under-represented* in the stdlib, because '&' will more often be useful in throw-away diagnostic code (which is all about human-readable text) than in production code. Lib\site-packages\numpy\distutils\system_info.py:2677: cmd = config_exe + ' ' + self.append_config_exe + ' ' + option cmd = config_exe & self.append_config_exe & option Lib\site-packages\wx\lib\masked\maskededit.py:3592: newstr = value[:self._signpos] + ' ' + value[self._signpos+1:-1] + ' ' newstr = value[:self._signpos] & value[self._signpos+1:-1] + ' ' Lib\site-packages\wx\lib\masked\maskededit.py:4056: text = text[:self._signpos] + ' ' + text[self._signpos+1:self._right_signpos] + ' ' + text[self._right_signpos+1:] text = text[:self._signpos] & text[self._signpos+1:self._right_signpos] & text[self._right_signpos+1:] Lib\distutils\sysconfig.py:212: ldshared = ldshared + ' ' + os.environ['LDFLAGS'] ldshared = ldshared & os.environ['LDFLAGS'] There are 6 more similar examples in the same module. The same 7 are in Lib\site-packages\setuptools\_distutils\sysconfig.py. Lib\site-packages\numpy\lib\tests\test_io.py:328-330: assert_equal(c.readlines(), [asbytes((fmt + ' ' + fmt + '\n') % (1, 2)), asbytes((fmt + ' ' + fmt + '\n') % (3, 4))]) assert_equal(c.readlines(), [asbytes((fmt & fmt + '\n') % (1, 2)), 2)), asbytes((fmt & fmt + '\n') % (3, 4))]) Lib\site-packages\numpy\f2py\crackfortran.py:1068: t = typespattern[0].match(m.group('before') + ' ' + name) t = typespattern[0].match(m.group('before') & name) Lib\site-packages\twisted\mail\imap4.py:3606: raise IllegalServerResponse('(' + k + ' '+ status[k] + '): ' + str(e)) raise IllegalServerResponse('(' + k & status[k] + '):' & str(e)) Lib\site-packages\twisted\runner\procmon.py:424-426: return ('<' + self.__class__.__name__ + ' ' + ' '.join(l) + '>') return ('<' + self.__class__.__name__ & ' '.join(l) + '>') Lib\site-packages\wx\lib\pydocview.py:3028: label = '&' + str(i + 1) + ' ' + frame.GetTitle() label = '&' + str(i + 1) & frame.GetTitle() Lib\test\test_pyexpat.py:90-91: self.out.append('Start element: ' + repr(name) + ' ' + sortdict(attrs)) self.out.append('Start element:' & repr(name) & sortdict(attrs)) Lib\test\test_pyexpat.py:102: self.out.append('PI: ' + repr(target) + ' ' + repr(data)) self.out.append('PI:' & repr(target) & repr(data)) Lib\test\test_pyexpat.py:105: self.out.append('NS decl: ' + repr(prefix) + ' ' + repr(uri)) self.out.append('NS decl:' & repr(prefix) & repr(uri)) In the above 3 examples, it is not necessary to replace the first '+' by '&', but I think it reads better to use the same operator both times. (And of course it saves 1 (space) character. 😁) Similarly in some other examples. Lib\site-packages\win32\Demos\security\sspi\fetch_url.py:57: h.putheader('Authorization', auth_scheme + ' ' + auth) h.putheader('Authorization', auth_scheme & auth) Tools\scripts\which.py:51: sts = os.system('ls ' + longlist + ' ' + filename) sts = os.system('ls' & longlist & filename) Less obviously: Lib\site-packages\pycparser\c_generator.py:406-407: nstr = '* %s%s' % (' '.join(modifier.quals), ' ' + nstr if nstr else '') nstr = '*' & ' '.join(modifier.quals) & nstr Examples where (I think) '&' could be applied to byte strings: Lib\site-packages\twisted\conch\ssh\keys.py:1279: return (self.sshType() + b' ' + b64Data + b' ' + comment).strip() return (self.sshType() & b64Data & comment).strip() Lib\site-packages\reportlab\pdfbase\pdfmetrics.py:391: text = text + b' ' + bytes(str(self.widths[i]),'utf8') text = text & + bytes(str(self.widths[i]),'utf8') Lib\site-packages\twisted\mail\pop3.py:326: yield intToBytes(i) + b' ' + intToBytes(size) + b'\r\n' yield intToBytes(i) & intToBytes(size) + b'\r\n' Lib\site-packages\twisted\mail\pop3.py:367: yield intToBytes(i + 1) + b' ' + uid + b'\r\n' yield intToBytes(i + 1) & uid + b'\r\n' Lib\site-packages\twisted\mail\pop3client.py:929: return self.sendShort(b'APOP', username + b' ' + digest) return self.sendShort(b'APOP', username & + digest) Lib\site-packages\twisted\mail\pop3client.py:1193-1194: return self._consumeOrAppend(b'TOP', idx + b' ' + intToBytes(lines), consumer, _dotUnquoter) return self._consumeOrAppend(b'TOP', idx & intToBytes(lines), consumer, _dotUnquoter) Lib\site-packages\twisted\mail\smtp.py:1647: r.append(c + b' ' + b' '.join(v)) r.append(c & b' '.join(v)) Lib\site-packages\twisted\mail\_cred.py:33: return self.user + b' ' + response.encode('ascii') return self.user & response.encode('ascii') Lib\site-packages\twisted\web\test\test_proxy.py:233: lines = [b"HTTP/1.0 " + str(code).encode('ascii') + b' ' + message] lines = [b"HTTP/1.0" & str(code).encode('ascii') & message] There are many examples (too many to list) where '&' could be used but would not add a great deal of value and its use or non-use would be largely a matter of taste. (It would be nice if there were always "One Obvious Way To Do It", but in the real world different tools sometimes overlap in their areas of application.) Some would doubtless find it: Pointless Obscure Of course (like any new feature) it *would* be more obscure - until we got used to it! 😁 Others might welcome (in appropriate use cases): The guarantee that the result string contains one and only one space between its textual parts, even when the first/second operand of '&' contains (unexpectedly?) trailing/leading whitespace. This guarantee making the code, in a way, *more* explicit. An ampersand being more visible than a leading/trailing space inside string quotes (as I am finding out the hard way, proof-reading this e-mail!🙁). (or even) Saving a few characters. (Wash your mouth out with soap, Rob! 😬) A few reasonably representative examples: Lib\cgitb.py:106: pyver = 'Python ' + sys.version.split()[0] + ': ' + sys.executable pyver = 'Python' & sys.version.split()[0] + ':' & sys.executable Lib\ftplib.py:289: cmd = 'PORT ' + ','.join(bytes) cmd = 'PORT' & ','.join(bytes) Lib\site-packages\pythonwin\pywin\tools\browser.py:182: return str(self.name) + ' (Instance of class ' + str(self.myobject.__class__.__name__) + ')' return str(self.name) & '(Instance of class' & str(self.myobject.__class__.__name__) + ')' Lib\site-packages\pythonwin\pywin\framework\scriptutils.py:564: win32ui.SetStatusText('Failed to ' + what + ' - ' + str(details) ) win32ui.SetStatusText('Failed to' & what & '-' & str(details) ) Personally, I think that examples such as the last, where multiple components are all joined with '&', are ones that particularly gain from increased clarity and reduced clutter. YMMV. Finally: how I might rewrite in Python a sample fragment from my own code in a different language: VehDesc.SetLabel(Vehicle.Reg_No & Vehicle.Make & Vehicle.Type & Vehicle.Colour) Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/96bd6/96bd64e7a366594c5d26a85666f197e797c6ffaa" alt=""
I'm -1 on this. You can easily make a helper that achieves the desired syntax. Presenting "human readable data" isn't just about collapsing spaces, and having your own helper means that you can adjust the formatting to your specific use case if needed (for example with a different separator). from typing import Self class StripJoin: def __init__(self, value: str = "") -> None: self.value = value def __and__(self, other: str) -> Self: other = other.strip() separator = bool(self.value and other) * " " return StripJoin(f"{self.value}{separator}{other}") def __str__(self) -> str: return self.value j = StripJoin() print(j & " foo " & " bar " & " something ") # Output: "foo bar something" The example above is more efficient than a possible implementation directly on the str builtin as it doesn't strip the left side over and over. However it still incurs repeated allocations and encourages a pattern that performs badly in loops. With a lot of input you should probably accumulate the stripped strings in a list and join them all at once. In any case I recommend reaching out for a library like Rich (https://github.com/Textualize/rich) if you care about formatting the output of your program nicely.
data:image/s3,"s3://crabby-images/21dda/21dda586b6b15305a5f5404123c2ec1fe76ef4a1" alt=""
I agree. This behavior is too specialized to be implemented as an operator. + is IME so infrequently used on python strings that we should take caution when proposing new binary string operators. Two strings to concat, maybe; three, possibly; four or more, and I reach for join() or f-strings or anything else. And if we are just joining two or three strings, that is readable right now as str1.rstrip() + ' ' + str2.lstrip(). A solution that is as extensible as join, no need for operators or subclasses: def connect(iterable, /): it = iter(iterable) first = next(it, '') last = next(it, None) if last is None: return first else: parts = [first.rstrip()] for this in it: parts.append(last.strip()) last = this parts.append(last.lstrip()) return ' '.join(parts) But I would not advocate for inclusion in the standard lib as there are too many parameters the user may want to behave differently for them. Keep the trailing spaces? Leading space? Add spaces when connecting an empty string? A string that is only whitespace? Operate only on exactly str or the __str__ value of any object? Regards, Jeremiah On Tue, Mar 7, 2023 at 1:57 AM Valentin Berlier <berlier.v@gmail.com> wrote:
data:image/s3,"s3://crabby-images/05fbd/05fbdfa1da293282beb61078913d943dc0e5ca1b" alt=""
It's not that tricky to have the lstrip/rstrip behaviour with a join: def space_join(*args): first, *middle, last = args return ' '.join((first.rstrip(), *(s.strip() for s in middle), last.lstrip())) What is harder is to be sure that this would be the expected behaviour when using a `&` operator on strings. Why `' a' & 'b'` would produce `'a b'` and `' ' & 'b'` produce `' b'` for example? Le sam. 11 mars 2023 à 10:04, Rob Cliffe via Python-ideas < python-ideas@python.org> a écrit :
-- Antoine Rozo
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Having an operand blank or all whitespace is not a common use case for join, nor would it be for my proposed '&' operator. I wanted to present a clear definition, and however I chose it, those corner cases would be non-obvious. That said, I have now withdrawn this proposal due to objections by Bruce Leban and others. Best wishes Rob Cliffe On 11/03/2023 17:13, Antoine Rozo wrote:
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Sun, Mar 5, 2023 at 7:39 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Tl;dr: Join strings together with exactly one space between non-blank
This idea is inappropriate for inclusion in the language. There are too many subtle details in how this should work as noted by others. To give an example which I hope clearly illustrates that this is not simple, what should the value of 'full' & 'width' be? - 'full width' - 'full width' In case it's not clear, those are full width Latin characters which should be joined by a full width (ideographic) space U+3000. What should be inserted when the characters are Kanji (which doesn't usually use spaces) or Amkharic (which uses '፡' U+1361) or a mix of different character sets? --- Bruce
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Is it really that much longer to write `f"{s1} {s2}"` when you want that? Maybe a couple characters more total, but once you are in an f-string, you can also do a zillion other things at the same time. On Sun, Mar 5, 2023 at 10:42 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/3d630/3d63057c3cfab4559cab902d5f910729055bb455" alt=""
On Mon, Mar 6, 2023 at 12:51 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Is it really that much longer to write `f"{s1} {s2}"` when you want that?
As for being that much longer: yes it is. The more important factor is, I think, the increase in complexity + readabiity for default strings is worth it in this case. One nice thing to think about might be how to make string subclassing to be more useful - so this kind of thing could be done for whoever needs it in one more idiomatic project. The drawback is how cumbersome it is to instantiate a string subclass compared to a string literal. (I just got an idea, but it would be too offtopic here - if I think it is worth, I will post it in a new thread later)
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
On 02.03.2023 18:27, Rob Cliffe via Python-ideas wrote:
I don't find these semantics particularly intuitive. Python already has the + operator for concatenating strings and this doesn't apply any stripping. If you're emphasizing on joining words with single space delimiters, then the usual: def join_words(list_of_words) return ' '.join([x.strip() for x in list_of_words]) works much better. You can also apply this recursively, if needed, or add support for list_of_phrases (first splitting these into a list_of_words). The advantage of join_words() is that it's easy to understand and applies stripping in a concise way. I use such helpers all the time. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 06 2023)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Mar 06, 2023 at 10:33:26AM +0100, Marc-Andre Lemburg wrote:
def join_words(list_of_words) return ' '.join([x.strip() for x in list_of_words])
That's not Rob's suggestion either. Rob's suggestion is an operator which concats two substrings with exactly one space between them, without stripping leading or trailing whitespace of the result. Examples: a = "\nHeading:" b = "Result\n\n" a & b would give "\nHeading: Result\n\n" s = " my hovercraft\n" t = " is full of eels\n" s & t would give " my hovercraft is full of eels\n" I find the concept is very easy to understand: "concat with exactly one space between the operands". But I must admit I'm struggling to think of cases where I would use it. I like the look of the & operator for concatenation, so I want to like this proposal. But I think I will need to see real world code to understand when it would be useful. -- Steve
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
On 06.03.2023 11:33, Steven D'Aprano wrote:
I know, but as I mentioned, I use the above often, whereas I find Rob's definition not very intuitive or useful.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 06 2023)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Steven D'Aprano writes:
I have to second that motion. Pretty much any time I'm constructing lines containing variable text, each string value arrives stripped and I far more often want padding of variable width values rather than space compression. I admit that I use M-SPC (aka just-one-space) lot in Emacsen, but I can't recall wanting it in a program in any language.
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Rob Cliffe writes:
Perhaps where you're not laying out a table,
I'm an economist, laying out tables is what I do. :-) Getting serious:
str '+' has long been quite rare in my coding. str concatenation is almost never in an inner loop, or slighly more complex formatting is the point. f-strings and .format save you the type conversion to str. So I don't find that occasional saving at all interesting. A vanishingly small number of my str constructions involve only strs with trivial formatting. What interests me about the proposal is the space-collapsing part, which a naive f-string would do incorrectly if, say, s2 == '' or s3 == '\t\t\ttabs to the left of me'. But where does this space-surrounded str data come from? Steve
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 07/03/2023 07:26, Stephen J. Turnbull wrote: purity, or a hideous symmetry breaking? I didn't highlight it before, so let me take the opportunity now: One of the examples I listed shows another use case: constructing an operating system command and ensuring that the parameters are separated by (at least) one space. Bear with me while I repeat it: Lib\site-packages\numpy\distutils\system_info.py:2677: cmd = config_exe + ' ' + self.append_config_exe + ' ' + option cmd = config_exe & self.append_config_exe & option Best wishes Rob Cliffe
Steve
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Bikeshedding: 1) I forgot to mention whether the '&' operator should apply to byte strings as well as to strings. I propose that it should (some supporting examples below). 2) Joao suggested spelling the operator '|'. But for me: '|' suggests "or" while '&' suggests "and" and is a more intuitive choice for some kind of concatenation. On 06/03/2023 10:33, Steven D'Aprano wrote:
Quite right, Steven. (By the way, thanks for reading my suggestion carefully, as you obviously did.) Below I list some examples from the 3.8.3 stdlib and how (I think) they could be rewritten. (Disclosure: I have selected the examples I found which I think are most convincing. Then again, I may have missed some.) I say "I think" because to be 100% sure (rather than 90%+ sure) in all cases would sometimes need a thorough analysis of the values that parts of an expression could (reasonably) take: specifically, whether some could be an empty string or all whitespace and what would happen if they were. Often in practice these cases can be disregarded or are unimportant. And of course the code author should know all about these cases. Arguably, good examples are *under-represented* in the stdlib, because '&' will more often be useful in throw-away diagnostic code (which is all about human-readable text) than in production code. Lib\site-packages\numpy\distutils\system_info.py:2677: cmd = config_exe + ' ' + self.append_config_exe + ' ' + option cmd = config_exe & self.append_config_exe & option Lib\site-packages\wx\lib\masked\maskededit.py:3592: newstr = value[:self._signpos] + ' ' + value[self._signpos+1:-1] + ' ' newstr = value[:self._signpos] & value[self._signpos+1:-1] + ' ' Lib\site-packages\wx\lib\masked\maskededit.py:4056: text = text[:self._signpos] + ' ' + text[self._signpos+1:self._right_signpos] + ' ' + text[self._right_signpos+1:] text = text[:self._signpos] & text[self._signpos+1:self._right_signpos] & text[self._right_signpos+1:] Lib\distutils\sysconfig.py:212: ldshared = ldshared + ' ' + os.environ['LDFLAGS'] ldshared = ldshared & os.environ['LDFLAGS'] There are 6 more similar examples in the same module. The same 7 are in Lib\site-packages\setuptools\_distutils\sysconfig.py. Lib\site-packages\numpy\lib\tests\test_io.py:328-330: assert_equal(c.readlines(), [asbytes((fmt + ' ' + fmt + '\n') % (1, 2)), asbytes((fmt + ' ' + fmt + '\n') % (3, 4))]) assert_equal(c.readlines(), [asbytes((fmt & fmt + '\n') % (1, 2)), 2)), asbytes((fmt & fmt + '\n') % (3, 4))]) Lib\site-packages\numpy\f2py\crackfortran.py:1068: t = typespattern[0].match(m.group('before') + ' ' + name) t = typespattern[0].match(m.group('before') & name) Lib\site-packages\twisted\mail\imap4.py:3606: raise IllegalServerResponse('(' + k + ' '+ status[k] + '): ' + str(e)) raise IllegalServerResponse('(' + k & status[k] + '):' & str(e)) Lib\site-packages\twisted\runner\procmon.py:424-426: return ('<' + self.__class__.__name__ + ' ' + ' '.join(l) + '>') return ('<' + self.__class__.__name__ & ' '.join(l) + '>') Lib\site-packages\wx\lib\pydocview.py:3028: label = '&' + str(i + 1) + ' ' + frame.GetTitle() label = '&' + str(i + 1) & frame.GetTitle() Lib\test\test_pyexpat.py:90-91: self.out.append('Start element: ' + repr(name) + ' ' + sortdict(attrs)) self.out.append('Start element:' & repr(name) & sortdict(attrs)) Lib\test\test_pyexpat.py:102: self.out.append('PI: ' + repr(target) + ' ' + repr(data)) self.out.append('PI:' & repr(target) & repr(data)) Lib\test\test_pyexpat.py:105: self.out.append('NS decl: ' + repr(prefix) + ' ' + repr(uri)) self.out.append('NS decl:' & repr(prefix) & repr(uri)) In the above 3 examples, it is not necessary to replace the first '+' by '&', but I think it reads better to use the same operator both times. (And of course it saves 1 (space) character. 😁) Similarly in some other examples. Lib\site-packages\win32\Demos\security\sspi\fetch_url.py:57: h.putheader('Authorization', auth_scheme + ' ' + auth) h.putheader('Authorization', auth_scheme & auth) Tools\scripts\which.py:51: sts = os.system('ls ' + longlist + ' ' + filename) sts = os.system('ls' & longlist & filename) Less obviously: Lib\site-packages\pycparser\c_generator.py:406-407: nstr = '* %s%s' % (' '.join(modifier.quals), ' ' + nstr if nstr else '') nstr = '*' & ' '.join(modifier.quals) & nstr Examples where (I think) '&' could be applied to byte strings: Lib\site-packages\twisted\conch\ssh\keys.py:1279: return (self.sshType() + b' ' + b64Data + b' ' + comment).strip() return (self.sshType() & b64Data & comment).strip() Lib\site-packages\reportlab\pdfbase\pdfmetrics.py:391: text = text + b' ' + bytes(str(self.widths[i]),'utf8') text = text & + bytes(str(self.widths[i]),'utf8') Lib\site-packages\twisted\mail\pop3.py:326: yield intToBytes(i) + b' ' + intToBytes(size) + b'\r\n' yield intToBytes(i) & intToBytes(size) + b'\r\n' Lib\site-packages\twisted\mail\pop3.py:367: yield intToBytes(i + 1) + b' ' + uid + b'\r\n' yield intToBytes(i + 1) & uid + b'\r\n' Lib\site-packages\twisted\mail\pop3client.py:929: return self.sendShort(b'APOP', username + b' ' + digest) return self.sendShort(b'APOP', username & + digest) Lib\site-packages\twisted\mail\pop3client.py:1193-1194: return self._consumeOrAppend(b'TOP', idx + b' ' + intToBytes(lines), consumer, _dotUnquoter) return self._consumeOrAppend(b'TOP', idx & intToBytes(lines), consumer, _dotUnquoter) Lib\site-packages\twisted\mail\smtp.py:1647: r.append(c + b' ' + b' '.join(v)) r.append(c & b' '.join(v)) Lib\site-packages\twisted\mail\_cred.py:33: return self.user + b' ' + response.encode('ascii') return self.user & response.encode('ascii') Lib\site-packages\twisted\web\test\test_proxy.py:233: lines = [b"HTTP/1.0 " + str(code).encode('ascii') + b' ' + message] lines = [b"HTTP/1.0" & str(code).encode('ascii') & message] There are many examples (too many to list) where '&' could be used but would not add a great deal of value and its use or non-use would be largely a matter of taste. (It would be nice if there were always "One Obvious Way To Do It", but in the real world different tools sometimes overlap in their areas of application.) Some would doubtless find it: Pointless Obscure Of course (like any new feature) it *would* be more obscure - until we got used to it! 😁 Others might welcome (in appropriate use cases): The guarantee that the result string contains one and only one space between its textual parts, even when the first/second operand of '&' contains (unexpectedly?) trailing/leading whitespace. This guarantee making the code, in a way, *more* explicit. An ampersand being more visible than a leading/trailing space inside string quotes (as I am finding out the hard way, proof-reading this e-mail!🙁). (or even) Saving a few characters. (Wash your mouth out with soap, Rob! 😬) A few reasonably representative examples: Lib\cgitb.py:106: pyver = 'Python ' + sys.version.split()[0] + ': ' + sys.executable pyver = 'Python' & sys.version.split()[0] + ':' & sys.executable Lib\ftplib.py:289: cmd = 'PORT ' + ','.join(bytes) cmd = 'PORT' & ','.join(bytes) Lib\site-packages\pythonwin\pywin\tools\browser.py:182: return str(self.name) + ' (Instance of class ' + str(self.myobject.__class__.__name__) + ')' return str(self.name) & '(Instance of class' & str(self.myobject.__class__.__name__) + ')' Lib\site-packages\pythonwin\pywin\framework\scriptutils.py:564: win32ui.SetStatusText('Failed to ' + what + ' - ' + str(details) ) win32ui.SetStatusText('Failed to' & what & '-' & str(details) ) Personally, I think that examples such as the last, where multiple components are all joined with '&', are ones that particularly gain from increased clarity and reduced clutter. YMMV. Finally: how I might rewrite in Python a sample fragment from my own code in a different language: VehDesc.SetLabel(Vehicle.Reg_No & Vehicle.Make & Vehicle.Type & Vehicle.Colour) Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/96bd6/96bd64e7a366594c5d26a85666f197e797c6ffaa" alt=""
I'm -1 on this. You can easily make a helper that achieves the desired syntax. Presenting "human readable data" isn't just about collapsing spaces, and having your own helper means that you can adjust the formatting to your specific use case if needed (for example with a different separator). from typing import Self class StripJoin: def __init__(self, value: str = "") -> None: self.value = value def __and__(self, other: str) -> Self: other = other.strip() separator = bool(self.value and other) * " " return StripJoin(f"{self.value}{separator}{other}") def __str__(self) -> str: return self.value j = StripJoin() print(j & " foo " & " bar " & " something ") # Output: "foo bar something" The example above is more efficient than a possible implementation directly on the str builtin as it doesn't strip the left side over and over. However it still incurs repeated allocations and encourages a pattern that performs badly in loops. With a lot of input you should probably accumulate the stripped strings in a list and join them all at once. In any case I recommend reaching out for a library like Rich (https://github.com/Textualize/rich) if you care about formatting the output of your program nicely.
data:image/s3,"s3://crabby-images/21dda/21dda586b6b15305a5f5404123c2ec1fe76ef4a1" alt=""
I agree. This behavior is too specialized to be implemented as an operator. + is IME so infrequently used on python strings that we should take caution when proposing new binary string operators. Two strings to concat, maybe; three, possibly; four or more, and I reach for join() or f-strings or anything else. And if we are just joining two or three strings, that is readable right now as str1.rstrip() + ' ' + str2.lstrip(). A solution that is as extensible as join, no need for operators or subclasses: def connect(iterable, /): it = iter(iterable) first = next(it, '') last = next(it, None) if last is None: return first else: parts = [first.rstrip()] for this in it: parts.append(last.strip()) last = this parts.append(last.lstrip()) return ' '.join(parts) But I would not advocate for inclusion in the standard lib as there are too many parameters the user may want to behave differently for them. Keep the trailing spaces? Leading space? Add spaces when connecting an empty string? A string that is only whitespace? Operate only on exactly str or the __str__ value of any object? Regards, Jeremiah On Tue, Mar 7, 2023 at 1:57 AM Valentin Berlier <berlier.v@gmail.com> wrote:
data:image/s3,"s3://crabby-images/05fbd/05fbdfa1da293282beb61078913d943dc0e5ca1b" alt=""
It's not that tricky to have the lstrip/rstrip behaviour with a join: def space_join(*args): first, *middle, last = args return ' '.join((first.rstrip(), *(s.strip() for s in middle), last.lstrip())) What is harder is to be sure that this would be the expected behaviour when using a `&` operator on strings. Why `' a' & 'b'` would produce `'a b'` and `' ' & 'b'` produce `' b'` for example? Le sam. 11 mars 2023 à 10:04, Rob Cliffe via Python-ideas < python-ideas@python.org> a écrit :
-- Antoine Rozo
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
Having an operand blank or all whitespace is not a common use case for join, nor would it be for my proposed '&' operator. I wanted to present a clear definition, and however I chose it, those corner cases would be non-obvious. That said, I have now withdrawn this proposal due to objections by Bruce Leban and others. Best wishes Rob Cliffe On 11/03/2023 17:13, Antoine Rozo wrote:
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Sun, Mar 5, 2023 at 7:39 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Tl;dr: Join strings together with exactly one space between non-blank
This idea is inappropriate for inclusion in the language. There are too many subtle details in how this should work as noted by others. To give an example which I hope clearly illustrates that this is not simple, what should the value of 'full' & 'width' be? - 'full width' - 'full width' In case it's not clear, those are full width Latin characters which should be joined by a full width (ideographic) space U+3000. What should be inserted when the characters are Kanji (which doesn't usually use spaces) or Amkharic (which uses '፡' U+1361) or a mix of different character sets? --- Bruce
participants (10)
-
Antoine Rozo
-
Bruce Leban
-
David Mertz, Ph.D.
-
Jeremiah Paige
-
Joao S. O. Bueno
-
Marc-Andre Lemburg
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Valentin Berlier