[Python-ideas] New explicit methods to trim strings

Tue Apr 2 19:46:27 EDT 2019

On 4/2/2019 2:02 PM, Rhodri James wrote:
> On 02/04/2019 18:55, Stephen J. Turnbull wrote:
>  >> = Me
>>   > 3. My most common use case (not very common at that) is for stripping
>>   > annoying prompts off text-based APIs.  I'm happy using
>>   > .startswith() and string slicing for that, though your point about
>>   > the repeated use of the string to be stripped off (or worse,
>>   > hard-coding its length) is well made.
>>
>> I don't understand this use case, specifically the opposition to
>> hard-coding the length.  Although hard-coding the length wouldn't
>> occur to me in many cases, since I'd use
>>
>>      # remove my bash prompt
>>      prompt_re = re.compile(r'^[^\u0000-\u001f\u007f]+ \d\d:\d\d\$ ')
>>      lines = [prompt_re.sub('', line) for line in lines]
> 
> For me it's more often like
> 
>      input = get_line_from_UART()
>      if input.startswith("INFO>"):
>          input = input[5:]
>      do_something_useful(input)
> 
> which is error-prone when you cut and paste for a different prompt 
> elsewhere and forget to change the slice to match.

I originally saw this, and I thought "Yeah, me, too!". But then I 
realize I rarely want to do this. I almost always want to know if the 
string began with the prefix. I'd normally use something like this:

--------------------------
for line in ["INFO>rest-of-line",
              "not-INFO>more-text",
              "text",
              "INFO>",
              ""]:
     start, sep, rest = line.partition("INFO>")
     if not start and sep:
         print(f"control line {rest!r}")
     else:
         print(f"data line {line!r}")

output:
control line 'rest-of-line'
data line 'not-INFO>more-text'
data line 'text'
control line ''
data line ''
--------------------------

Breaking it out as a function gives how I'd need to call this, if we 
made it a function (or method on str):

--------------------------
def str_has_prefix(s, prefix):
     '''returns (True, rest-of-string) or (False, s)'''
     start, sep, rest = s.partition(prefix)
     if not start and sep:
         return True, rest
     else:
         return False, s

for line in ["INFO>rest-of-line",
              "not-INFO>more-text",
              "text",
              "INFO>",
              ""]:
     has_prefix, line = str_has_prefix(line, "INFO>")
     if has_prefix:
         print(f"control line {line!r}")
     else:
         print(f"data line {line!r}")
--------------------------

Now I'll admit it's not super-efficient to create the start, sep, and 
rest sub-strings all the time, and maybe the test "not start and sep" 
isn't so obvious at first glance, but for my work this is good enough.

It's not super-important how the function (or method) is implemented, 
I'm more concerned about the interface. If it was done in C, it 
obviously wouldn't call .partition().

So while I was originally +1 on this proposal, now I'm not so sure, 
given how I normally need to check if the string starts with a prefix 
and get the rest of the string if it does start with the prefix.

On the other hand, just this weekend I was helping (again) with someone 
who misunderstood str.strip() on the bug tracker: 
https://bugs.python.org/issue36480, so I know .strip() and friends 
confuses people. But I don't think we can use that fact to say that we 
need .lcut()/.rcut().

It's just that as it's being proposed here, I think lcut/rcut (of 
whatever names) just doesn't have a useful interface, for me. I don't 
think I've ever wanted to remove a prefix/suffix if it existed, else use 
the whole string, and not know which case occurred.

Eric

PS: I really tried to find a way to use := in this example so I could 
put the assignment inside the 'if' statement, but as I think Tim Peters 
pointed out, without C's comma operator, you can't.