
On Tue, Dec 6, 2016, at 19:51, Stephen J. Turnbull wrote:
Random832 writes:
Is there any particular objection to allowing the backslash-space escape (and for escapes that mean whitespace characters, such as \t, \x20, to not split, if you meant to imply that they do)? That would provide the extra push to this being beneficial over split().
You're suggesting that (1) most escapes would be processed after splitting while (2) backslash-space (what about backslash-tab?) would be treated as an escape during splitting?
I don't understand what this "after splitting" you're talking about is. It would be a single pass through the characters of the token, with space alone meaning "eat all whitespace, next string" and space in backslash state meaning "next character of current string is space", just as "t" alone means "next character of current string is letter t" and t in backslash state means "next character of current string is space". I mean, even the idea that there would be a separate "splitting step" at all makes no sense to me, this implies building an "un-split string" as if the w weren't present, processing escapes as part of that, and then parsing the resulting string in a second pass, which is something we don't do for r"..." and *shouldn't* do for f"..." If you insist on consistency, backslash-space can mean space *everywhere* [once we've gotten through the deprecation cycle of backslash-unknown inserting a literal backslash], just like "\'" works fine despite double quotes not requiring it. As for backslash-tab, we already have \t. Maybe you'd like \s better for space.
I also have an alternate idea: sl{word1 word2 'string 3' "string 4"}
word1 and word2 are what perl would term "barewords"? Ie treated as strings?
The name "sl" was meant to evoke shlex (the syntax itself was also inspired by perl's qw{...} though perl doesn't provide any way of escaping whitespace). And I also meant this as a launching-off point for a general suggestion of word{ ... } as a readable syntax that doesn't collide with any currently valid constructs, for new kinds of literals (e.g. frozenset{a, b, c} and so on) So the result would be, more or less, the sequence that shlex.split('''word1 word2 'string 3' "string 4"''') gives.
-1 to w"", -1 to inconsistent interpretation of escapes, and -1 to a completely new syntax.
" ", "\x20", "\u0020", and "\U00000020" currently are different representations of the same string, so it would be confusing if the same notations meant different things in this context.
"'" and "\x39" (etc) are representations of the same string, but '...\x39 doesn't act as an end quote. Unescaped whitespace within a w"" literal would be *syntax*, not *content*. (Whereas in a regular literal backslash is syntax but in a r'...' literal it's content)
Another syntax plus overloading standard string notation with yet another semantics (strings, rawstrings) doesn't seem like a win to me.
As I accept the usual Pythonic aversion to mere abbreviations, I don't see any benefit to these notations, except for the case where a list just won't do, so you can avoid a call to tuple. We already have three good ways to do this:
wordlist = ["word1", "word2", "string 3", "string 4"] wordlist = "word1,word2,string 3,string 4".split(",") wordlist = open(word_per_line_file).readlines()
and for maximum Unicode-conforming generality with compact notation:
wordlist = "word1\UFFFFword2\UFFFFstring 3\UFFFFstring 4".split("\UFFFF")
You and I have very different definitions of the word "compact". In fact, this is *so obviously* non-compact that I find it hard to believe that you're being serious, but I don't think the joke's very funny if it's intended as one.
More seriously, in most use cases there will be ASCII control characters that you could use, which most editors can enter (though they might be visually unattractive in many editors, eg, \x0C).
The point of using space is readability. (The point of returning a tuple is to avoid the disadvantage that the list returned by split must be built at runtime and can't be loaded as a constant, or perhaps turned into a frozenset constant by the optimizer in cases like "if x in w'foo bar baz':".