[Python-ideas] Proposal: Tuple of str with w'list of words'

Wed Dec 7 01:34:06 EST 2016

On Tue, Dec 6, 2016, at 19:51, Stephen J. Turnbull wrote:
> Random832 writes:
> 
>  > Is there any particular objection to allowing the backslash-space escape
>  > (and for escapes that mean whitespace characters, such as \t, \x20, to
>  > not split, if you meant to imply that they do)? That would provide the
>  > extra push to this being beneficial over split().
> 
> You're suggesting that (1) most escapes would be processed after
> splitting while (2) backslash-space (what about backslash-tab?) would
> be treated as an escape during splitting?

I don't understand what this "after splitting" you're talking about is.
It would be a single pass through the characters of the token, with
space alone meaning "eat all whitespace, next string" and space in
backslash state meaning "next character of current string is space",
just as "t" alone means "next character of current string is letter t"
and t in backslash state means "next character of current string is
space".

I mean, even the idea that there would be a separate "splitting step" at
all makes no sense to me, this implies building an "un-split string" as
if the w weren't present, processing escapes as part of that, and then
parsing the resulting string in a second pass, which is something we
don't do for r"..." and *shouldn't* do for f"..."

If you insist on consistency, backslash-space can mean space
*everywhere* [once we've gotten through the deprecation cycle of
backslash-unknown inserting a literal backslash], just like "\'" works
fine despite double quotes not requiring it. As for backslash-tab, we
already have \t. Maybe you'd like \s better for space.

>  > I also have an alternate idea: sl{word1 word2 'string 3' "string 4"}
> 
> word1 and word2 are what perl would term "barewords"?  Ie treated as
> strings?

The name "sl" was meant to evoke shlex (the syntax itself was also
inspired by perl's qw{...} though perl doesn't provide any way of
escaping whitespace). And I also meant this as a launching-off point for
a general suggestion of word{ ... } as a readable syntax that doesn't
collide with any currently valid constructs, for new kinds of literals
(e.g. frozenset{a, b, c} and so on)

So the result would be, more or less, the sequence that
shlex.split('''word1 word2 'string 3' "string 4"''') gives.

> -1 to w"", -1 to inconsistent interpretation of escapes, and -1 to a
> completely new syntax.
> 
> " ", "\x20", "\u0020", and "\U00000020" currently are different
> representations of the same string, so it would be confusing if the
> same notations meant different things in this context.

"'" and "\x39" (etc) are representations of the same string, but
'...\x39 doesn't act as an end quote. Unescaped whitespace within a w""
literal would be *syntax*, not *content*. (Whereas in a regular literal
backslash is syntax but in a r'...' literal it's content)

>  Another syntax
> plus overloading standard string notation with yet another semantics
> (strings, rawstrings) doesn't seem like a win to me.
> 
> As I accept the usual Pythonic aversion to mere abbreviations, I don't
> see any benefit to these notations, except for the case where a list
> just won't do, so you can avoid a call to tuple.  We already have
> three good ways to do this:
> 
>     wordlist = ["word1", "word2", "string 3", "string 4"]
>     wordlist = "word1,word2,string 3,string 4".split(",")
>     wordlist = open(word_per_line_file).readlines()
> 
> and for maximum Unicode-conforming generality with compact notation:
> 
>     wordlist = "word1\UFFFFword2\UFFFFstring 3\UFFFFstring
>     4".split("\UFFFF")

You and I have very different definitions of the word "compact". In
fact, this is *so obviously* non-compact that I find it hard to believe
that you're being serious, but I don't think the joke's very funny if
it's intended as one.

> More seriously, in most use cases there will be ASCII control
> characters that you could use, which most editors can enter (though
> they might be visually unattractive in many editors, eg, \x0C).

The point of using space is readability. (The point of returning a tuple
is to avoid the disadvantage that the list returned by split must be
built at runtime and can't be loaded as a constant, or perhaps turned
into a frozenset constant by the optimizer in cases like "if x in w'foo
bar baz':".