On Tue, Oct 22, 2019 at 08:53:53PM -0400, Todd wrote: [I wrote this]
I would expect %w{ ... } to return a set, not a list:
%w[ ... ] # list %w{ ... ] # set %w( ... ) # tuple
[Todd replied]
This is growing into an entire new group of constructors for a very, very limited number of operations that have been privileged for some reason.
Sure. That's what syntactic sugar is: privileging one particular thing over another. That's why, for example, we privilage the idiom: import spam eggs = spam.eggs by giving it special syntax, but not class Spam: ... spam = Spam(args) del Spam Some things are privileged. We privilage for-loops as comprehensions, but not while-loop; we privilage getting a bunch of indexes in a sequence as a slice ``sequence[start:end]`` but not getting a bunch of items from a dict. Not everything can be syntactic sugar; but that doesn't mean nothing should be syntactic sugar.
Should %{a=b c=d} create dicts, too? Why not?
Probably not, because we can already say ``dict(spam=x)`` to get the key "spam". That's specifically one of the motivating examples why dict takes keyword arguments. In the early days of Python, it didn't.
Why should strings be privileged over, say, numbers?
Because we don't write ints or floats or complex numbers with delimiters. We say 5, not "5".
Why should %w[1 2 3] make ['1', '2', '3'] instead of [1, 2, 3]?
Because the annoyance factor of having to quote each word is far greater than the annoyance factor of having to put commas between values.
And why whitespace instead of a comma?
Because seperating words with whitespace is convenient when you have a lot of data. The spacebar, Tab and Enter keys are nice, big targets which are easy to hit, the comma isn't. Splitting on whitespace means that spaces and newlines Just Work: data = %w[alpha beta gamma ... psi chi omega] # gives ['alpha', 'beta', 'gamma', ... 'psi', 'chi', 'omega'] whereas splitting on commas alone gives you a nasty surprise: data = %w[alpha, beta, gamma, ..., psi, chi, omega] # ['alpha', ' beta', ' gamma', ..., '\n psi', ' chi', ' omega'] To avoid that, you need to complicate the rule something like:to "commas or whitespace", or "commas optionally followed by whitespace", or something even more complicated. The more complicated the rule, the more surprising it will be when you get caught out by some odd corner case of the rule you weren't expecting. Splitting on whitespace is a nice, simple rule that cannot go wrong. Why make it more complicated than it needs to be?
We have general ways to handle all of this stuff that doesn't lock us into a single special case.
Who is talking about locking us into a special case? "string literal".split() will still work, so will ["string", "literal"].
and I would describe them as list/set/tuple "word literals". Unlike list etc displays [spam, eggs, cheese] these would actually be true literals that can be determined entirely at compile-time.
I don't know enough about the internals to say whether this would be possible or not.
It would be a pretty awful compiler that couldn't take a space-seperated sequence of characters and compile them as strings. I'm not wedded to the leading and trailing delimiters %w[ and ] if they turn out to be ambiguous with something else (the % operator?), but I don't think they will be. [...]
Yes, I understand that Python has syntactic sugar. But any new syntactic sugar necessarily has an uphill battle due people having to learn it, books and classes having to be updated, linters updated, new pep8 guidelines written, etc. We already have a way to split strings. So the question is why we need this in addition to what we already have,
Because it smooths out a minor annoyance and makes for a more pleasant programming experience for the coder, without having to worry (rightly or wrongly) about performance. The status quo is that every time I need to write a list or set of words, I have to stop and think: "Should I quote them all by hand, like a caveman, or get the interpreter to split it? If I get the interpreter to split it, will it hurt performance?" but with this proposed syntax, there is One Obvious Way to write a list of words. I won't have to think about it, or worry that I should be worrying about performance.
especially considering it is so radically different than anything else in Python.
Your idea of "radically different" is a lot less radical than mine. To me, radically different would mean something like Hypertalk syntax: put the value of the third line of text into word seven of result or Reverse Polish Notation syntax. Not adding a prefix to list delimiters. We already have string prefixes, we already have list delimiters, putting the two concepts together is not a huge conceptual leap. Certainly a lot smaller than adding async to the language.
So everyone would have to learn a completely new way of building lists, tuples, and sets that only applies to a particular combination of strings and whitespace.
Yes, everyone would have to learn this new feature. It would take most people approximately five seconds to get the basics and maybe a minute to explore the consequences in full. This isn't a complicated feature: its a list of whitespace delimited words. The most complicated feature I can think of is whether we should allow escaping spaces or not: names = %w[Aaron Susan Helen Fred Mary\ Beth] names = %w[Aaron Susan Helen Fred Mary%x20Beth] [...]
The new syntax should have some real use-cases that current syntax can't solve. I am not seeing that here.
True, this doesn't solve any problems that can't already be solved. It is pure syntactic sugar. It's nice when new syntax lets us do things that we couldn't do before, but that's not a requirement, and Python has lots of syntactic sugar simply because it makes the programming experience nicer: - decorator syntax versus explicit function calls - keyword arguments versus positional arguments - ``from ... import`` versus plain old ``import`` - slicing could easily be a method call - f-strings versus string formatting - triple-quoted strings, string escapes, and raw strings. I doubt that this proposal will change the language ecosystem as decorators did, but I think it would be a small improvement that removes a minor pain point. [...]
The result is a list, but the input is a string. It is string processing the same way all the string methods are string processing.
All source code is nothing but strings. We don't normally include parsing or lexing source code, or compile-time preprocessing, as "string processing". String processing normally refers to the actions your program takes to process strings, as opposed to compiling your program in the first case. When the compiler parses a list containing string literals into code: # input data = ['a', 'b', 'c'] # output 1 0 LOAD_CONST 0 ('a') 3 LOAD_CONST 1 ('b') 6 LOAD_CONST 2 ('c') 9 BUILD_LIST 3 12 STORE_NAME 0 (data) 15 LOAD_CONST 3 (None) 18 RETURN_VALUE we don't call that "string processing" (unless you're a compiler writer, I guess), and we shouldn't call %w[a b c] that either. The output should be identical, and we could implement this right now with a source code pre-processor. -- Steven