[Python-ideas] New explicit methods to trim strings
Eric V. Smith
eric at trueblade.com
Tue Apr 2 19:46:27 EDT 2019
On 4/2/2019 2:02 PM, Rhodri James wrote:
> On 02/04/2019 18:55, Stephen J. Turnbull wrote:
> >> = Me
>> > 3. My most common use case (not very common at that) is for stripping
>> > annoying prompts off text-based APIs. I'm happy using
>> > .startswith() and string slicing for that, though your point about
>> > the repeated use of the string to be stripped off (or worse,
>> > hard-coding its length) is well made.
>>
>> I don't understand this use case, specifically the opposition to
>> hard-coding the length. Although hard-coding the length wouldn't
>> occur to me in many cases, since I'd use
>>
>> # remove my bash prompt
>> prompt_re = re.compile(r'^[^\u0000-\u001f\u007f]+ \d\d:\d\d\$ ')
>> lines = [prompt_re.sub('', line) for line in lines]
>
> For me it's more often like
>
> input = get_line_from_UART()
> if input.startswith("INFO>"):
> input = input[5:]
> do_something_useful(input)
>
> which is error-prone when you cut and paste for a different prompt
> elsewhere and forget to change the slice to match.
I originally saw this, and I thought "Yeah, me, too!". But then I
realize I rarely want to do this. I almost always want to know if the
string began with the prefix. I'd normally use something like this:
--------------------------
for line in ["INFO>rest-of-line",
"not-INFO>more-text",
"text",
"INFO>",
""]:
start, sep, rest = line.partition("INFO>")
if not start and sep:
print(f"control line {rest!r}")
else:
print(f"data line {line!r}")
output:
control line 'rest-of-line'
data line 'not-INFO>more-text'
data line 'text'
control line ''
data line ''
--------------------------
Breaking it out as a function gives how I'd need to call this, if we
made it a function (or method on str):
--------------------------
def str_has_prefix(s, prefix):
'''returns (True, rest-of-string) or (False, s)'''
start, sep, rest = s.partition(prefix)
if not start and sep:
return True, rest
else:
return False, s
for line in ["INFO>rest-of-line",
"not-INFO>more-text",
"text",
"INFO>",
""]:
has_prefix, line = str_has_prefix(line, "INFO>")
if has_prefix:
print(f"control line {line!r}")
else:
print(f"data line {line!r}")
--------------------------
Now I'll admit it's not super-efficient to create the start, sep, and
rest sub-strings all the time, and maybe the test "not start and sep"
isn't so obvious at first glance, but for my work this is good enough.
It's not super-important how the function (or method) is implemented,
I'm more concerned about the interface. If it was done in C, it
obviously wouldn't call .partition().
So while I was originally +1 on this proposal, now I'm not so sure,
given how I normally need to check if the string starts with a prefix
and get the rest of the string if it does start with the prefix.
On the other hand, just this weekend I was helping (again) with someone
who misunderstood str.strip() on the bug tracker:
https://bugs.python.org/issue36480, so I know .strip() and friends
confuses people. But I don't think we can use that fact to say that we
need .lcut()/.rcut().
It's just that as it's being proposed here, I think lcut/rcut (of
whatever names) just doesn't have a useful interface, for me. I don't
think I've ever wanted to remove a prefix/suffix if it existed, else use
the whole string, and not know which case occurred.
Eric
PS: I really tried to find a way to use := in this example so I could
put the assignment inside the 'if' statement, but as I think Tim Peters
pointed out, without C's comma operator, you can't.
More information about the Python-ideas
mailing list