[Python-ideas] Re: Percent notation for array and string literals, similar to Perl, Ruby

Oct. 23, 2019

      On Tue, Oct 22, 2019 at 08:53:53PM -0400, Todd wrote:

[I wrote this]
...
...
I would expect %w{ ... } to return a set, not a list:
%w[ ... ]  # list
    %w{ ... ]  # set
    %w( ... )  # tuple
[Todd replied]
...
This is growing into an entire new group of constructors for a very, very
limited number of operations that have been privileged for some reason.
Sure. That's what syntactic sugar is: privileging one particular thing 
over another. That's why, for example, we privilage the idiom:

    import spam
    eggs = spam.eggs

by giving it special syntax, but not

    class Spam: ...
    spam = Spam(args)
    del Spam

Some things are privileged. We privilage for-loops as comprehensions, 
but not while-loop; we privilage getting a bunch of indexes in a 
sequence as a slice ``sequence[start:end]`` but not getting a bunch of 
items from a dict. Not everything can be syntactic sugar; but that 
doesn't mean nothing should be syntactic sugar.
...
Should %{a=b c=d} create dicts, too?  Why not?
Probably not, because we can already say ``dict(spam=x)`` to get the key 
"spam". That's specifically one of the motivating examples why dict 
takes keyword arguments. In the early days of Python, it didn't.
...
Why should strings be privileged over, say, numbers?
Because we don't write ints or floats or complex numbers with 
delimiters. We say 5, not "5".
...
Why should %w[1 2 3] make ['1', '2', '3'] instead of [1, 2, 3]?
Because the annoyance factor of having to quote each word is far greater 
than the annoyance factor of having to put commas between values.
...
And why whitespace instead of a comma?
Because seperating words with whitespace is convenient when you have a 
lot of data. The spacebar, Tab and Enter keys are nice, big targets 
which are easy to hit, the comma isn't.

Splitting on whitespace means that spaces and newlines Just Work:

    data = %w[alpha beta gamma ...
              psi chi omega]
    # gives ['alpha', 'beta', 'gamma', ... 'psi', 'chi', 'omega']

whereas splitting on commas alone gives you a nasty surprise:

    data = %w[alpha, beta, gamma, ...,
              psi, chi, omega]
    # ['alpha', ' beta', ' gamma', ..., '\n              psi', ' chi', ' omega']

To avoid that, you need to complicate the rule something like:to "commas 
or whitespace", or "commas optionally followed by whitespace", or 
something even more complicated. The more complicated the rule, the more 
surprising it will be when you get caught out by some odd corner case of 
the rule you weren't expecting.

Splitting on whitespace is a nice, simple rule that cannot go wrong. Why 
make it more complicated than it needs to be?
...
We have
general ways to handle all of this stuff that doesn't lock us into a single
special case.
Who is talking about locking us into a special case? "string 
literal".split() will still work, so will ["string", "literal"].
...
...
and I would describe them as list/set/tuple "word literals". Unlike
list etc displays [spam, eggs, cheese] these would actually be true
literals that can be determined entirely at compile-time.
I don't know enough about the internals to say whether this would be
possible or not.
It would be a pretty awful compiler that couldn't take a space-seperated 
sequence of characters and compile them as strings.

I'm not wedded to the leading and trailing delimiters %w[ and ] if they 
turn out to be ambiguous with something else (the % operator?), but I 
don't think they will be.

[...]
...
Yes, I understand that Python has syntactic sugar.  But any new syntactic
sugar necessarily has an uphill battle due people having to learn it, books
and classes having to be updated, linters updated, new pep8 guidelines
written, etc.  We already have a way to split strings.  So the question is
why we need this in addition to what we already have,
Because it smooths out a minor annoyance and makes for a more pleasant 
programming experience for the coder, without having to worry (rightly 
or wrongly) about performance.

The status quo is that every time I need to write a list or set of 
words, I have to stop and think:

    "Should I quote them all by hand, like a caveman, or get the 
    interpreter to split it? If I get the interpreter to split 
    it, will it hurt performance?"

but with this proposed syntax, there is One Obvious Way to write a list 
of words. I won't have to think about it, or worry that I should be 
worrying about performance.
...
especially
considering it is so radically different than anything else in Python.
Your idea of "radically different" is a lot less radical than mine.

To me, radically different would mean something like Hypertalk syntax:

    put the value of the third line of text into word seven of result

or Reverse Polish Notation syntax. Not adding a prefix to list 
delimiters. We already have string prefixes, we already have list 
delimiters, putting the two concepts together is not a huge 
conceptual leap. Certainly a lot smaller than adding async to the 
language.
...
So everyone would have to learn a completely new way of building
lists, tuples, and sets that only applies to a particular combination of
strings and whitespace.
Yes, everyone would have to learn this new feature. It would take most 
people approximately five seconds to get the basics and maybe a minute 
to explore the consequences in full. This isn't a complicated feature: 
its a list of whitespace delimited words.

The most complicated feature I can think of is whether we should allow 
escaping spaces or not:

    names = %w[Aaron Susan Helen Fred Mary\ Beth]
    names = %w[Aaron Susan Helen Fred Mary%x20Beth]

[...]
...
The new syntax should have some real
use-cases that current syntax can't solve.  I am not seeing that here.
True, this doesn't solve any problems that can't already be solved. It 
is pure syntactic sugar.

It's nice when new syntax lets us do things that we couldn't do before, 
but that's not a requirement, and Python has lots of syntactic sugar 
simply because it makes the programming experience nicer:

- decorator syntax versus explicit function calls
- keyword arguments versus positional arguments
- ``from ... import`` versus plain old ``import``
- slicing could easily be a method call
- f-strings versus string formatting
- triple-quoted strings, string escapes, and raw strings.

I doubt that this proposal will change the language ecosystem as 
decorators did, but I think it would be a small improvement that 
removes a minor pain point.

[...]
...
The result is a list, but the input is a string.  It is string processing
the same way all the string methods are string processing.
All source code is nothing but strings. We don't normally include 
parsing or lexing source code, or compile-time preprocessing, as "string 
processing". String processing normally refers to the actions your 
program takes to process strings, as opposed to compiling your program 
in the first case.

When the compiler parses a list containing string literals into code:

    # input
    data = ['a', 'b', 'c']

    # output
  1           0 LOAD_CONST               0 ('a')
              3 LOAD_CONST               1 ('b')
              6 LOAD_CONST               2 ('c')
              9 BUILD_LIST               3
             12 STORE_NAME               0 (data)
             15 LOAD_CONST               3 (None)
             18 RETURN_VALUE

we don't call that "string processing" (unless you're a compiler 
writer, I guess), and we shouldn't call %w[a b c] that either. 
The output should be identical, and we could implement this 
right now with a source code pre-processor.

-- 
Steven

[Python-ideas] Re: Percent notation for array and string literals, similar to Perl, Ruby

Steven D'Aprano