strip() using strings instead of chars

Christoph Zwerschke cito at online.de
Fri Jul 11 07:55:05 EDT 2008


In Python programs, you will quite frequently find code like the
following for removing a certain prefix from a string:

if url.startswith('http://'):
     url = url[7:]

Similarly for stripping suffixes:

if filename.endswith('.html'):
     filename = filename[:-5]

My problem with this is that it's cumbersome and error prone to count
the number of chars of the prefix or suffix. If you want to change it
from 'http://' to 'https://', you must not forget to change the 7 to 8.
If you write len('http://')  instead of the 7, you see this is actually
a DRY problem.

Things get even worse if you have several prefixes to consider:

if url.startswith('http://'):
     url = url[7:]
elif url.startswith('https://'):
     url = url[8:]

You can't take use of url.startswith(('http://', 'https://')) here.

Here is another concrete example taken from the standard lib:

     if chars.startswith(BOM_UTF8):
         chars = chars[3:].decode("utf-8")

This avoids hardcoding the BOM_UTF8, but its length is still hardcoded,
and the programmer had to know it or look it up when writing this line.

So my suggestion is to add another string method, say "stripstr" that
behaves like "strip", but instead of stripping *characters* strips
*strings* (similarly for lstrip and rstrip). Then in the case above,
you could simply write url = url.lstripstr('http://') or
url = url.lstripstr(('http://', 'https://')).

The new function would actually comprise the old strip function, you
would have strip('aeiou') == stripstr(set('aeio')).

Instead of a new function, we could also add another parameter to strip
(lstrip, rstrip) for passing strings or changing the behavior, or we
could create functions with the signature of startswith and endswith
which instead of only checking whether the string starts or ends with
the substring, remove the substring (startswith and endswith have
additional "start" and "end" index parameters that may be useful).

Or did I overlook anything and there is already a good idiom for this?

Btw, in most other languages, "strip" is called "trim" and behaves
like Python's strip, i.e. considers the parameter as a set of chars.
There is one notable exception: In MySQL, trim behaves like stripstr
proposed above (differently to SQLite, PostgreSQL and Oracle).

-- Christoph



More information about the Python-list mailing list