I was actually thinking about this before the recent "string comprehension" thread. I wasn't really going to post the idea, but it's similar enough that I am nudged to. Moreover, since PEP 616 added str.removeprefix() and str.removesuffix(), this feels like a natural extension of that. I find myself very often wanting to remove several substrings of similar lines to get at "the good bits" for my purpose. Log files are a good example of this, but it arises in lots of other contexts I encounter. Let's take a not-absurd hypothetical: GET [http://example.com/picture] 200 image/jpeg POST [http://nowhere.org/data] 200 application/json PUT [https://example.org/page] 200 text/html For each of these lines, I'd like to see the URL and the MIME type only. The new str.removeprefix() helps some, but not as much as I would like since the "remove a tuple of prefixes" idea was rejected for PEP 616. But even past that, very often much of what I want to remove is in the middle, not at the start or the end. I know I can use regular expressions here. However, they are definitely a higher cognitive burden, and especially so for those who haven't taught them and written about them a lot, as I have. Even for me, I'd rather not think about regexen if I don't *have to*. So probably I'll do something like this: for line in lines: for noise in ('GET', 'POST', 'PUT', '200', '[', ']'): line = line.replace(noise, '') process_line(line) That's not horrible, but it would be nicer to write: for line in lines: process_line(line.remove(('GET', 'POST', 'PUT', '200', '[', ']')) Of course, if I really needed this as much as I seem to be suggesting, I know how to write a function `remove_strings()`... and I confess I have not done that. Or at least I haven't done it in some standard "my_utils" module I always import. Nonetheless, a string method would feel even more natural than a function taking the string as an argument. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On 01May2021 05:30, David Mertz <mertz@gnosis.cx> wrote:
I was actually thinking about this before the recent "string comprehension" thread. I wasn't really going to post the idea, but it's similar enough that I am nudged to. Moreover, since PEP 616 added str.removeprefix() and str.removesuffix(), this feels like a natural extension of that.
I find myself very often wanting to remove several substrings of similar lines to get at "the good bits" for my purpose. Log files are a good example of this, but it arises in lots of other contexts I encounter. Let's take a not-absurd hypothetical:
GET [http://example.com/picture] 200 image/jpeg POST [http://nowhere.org/data] 200 application/json PUT [https://example.org/page] 200 text/html
For each of these lines, I'd like to see the URL and the MIME type only. The new str.removeprefix() helps some, but not as much as I would like since the "remove a tuple of prefixes" idea was rejected for PEP 616. But even past that, very often much of what I want to remove is in the middle, not at the start or the end.
This is not a good way to tidy up log lines. try parsing it into fields: PUT http://example.com/picture 200 image.jpeg and then only looking at the fields you care about.
I know I can use regular expressions here. However, they are definitely a higher cognitive burden, and especially so for those who haven't taught them and written about them a lot, as I have. Even for me, I'd rather not think about regexen if I don't *have to*.
Though for this, they are ok. Or even just: method, _url_, code, mimetype = line.split(None,3) There shouldn't be any whitespace in a log line URL - it should be percent encoded.
So probably I'll do something like this:
for line in lines: for noise in ('GET', 'POST', 'PUT', '200', '[', ']'): line = line.replace(noise, '')
This is a very bad way to do this. What about thr URL "http://example.com/foo/PUT/bah". Badness ensues. It's worse than using a well written regexp.
process_line(line)
That's not horrible, but it would be nicer to write:
for line in lines: process_line(line.remove(('GET', 'POST', 'PUT', '200', '[', ']'))
I'm -1 on this idea. As you note, str.replace already exists and does what your line.remove does, just on a single substring basis. It's a trivial exercise to write an mreplace(s,substrs) function. Just do it and put it in your personal kit, and import it.
Of course, if I really needed this as much as I seem to be suggesting, I know how to write a function `remove_strings()`... and I confess I have not done that. Or at least I haven't done it in some standard "my_utils" module I always import. Nonetheless, a string method would feel even more natural than a function taking the string as an argument.
A method is almost always "easier/natural", but how many do we really want? If you really want this, write a StrMixin with a bunch of nice methods, subclass str, and promote your lines to your new subclass. Methods managed! Cheers, Cameron Simpson <cs@cskk.id.au>
On Sat, May 1, 2021, 3:17 AM Cameron Simpson <cs@cskk.id.au> wrote:
Let's take a not-absurd hypothetical:
GET [http://example.com/picture] 200 image/jpeg POST [http://nowhere.org/data] 200 application/json PUT [https://example.org/page] 200 text/html
Though for this, they are ok. Or even just:
method, _url_, code, mimetype = line.split(None,3)
Notice that in my example, I agree extra square brackets that I want to get rid of as well, which your line doesn't do. Of course an extra line or two could. But often enough, I want to remove certain fixed substrings in lines that don't have a uniform delimiter like a space. There shouldn't be any whitespace in a log line URL - it should be
percent encoded.
Lots of things "should be" :-). Sadly, I deal with "actually existing data."
participants (3)
-
Cameron Simpson
-
David Mertz
-
Stephen J. Turnbull