
Stephen J. Turnbull wrote:
I don't understand this point of view at all. True, regexps are a complex subject, with an unfortunately large number of dialects. Is it the confusion of dialects problem, or do you really never use regexps in any language?
I have half-heartedly tried to learn regexps before, but always given up after reading about the basics. Obviously, this would be shameless behavior for a professional programmer, but I'm just a dilettante, and the famed saying of Jamie Zawinski ("Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems.") is not highly motivating. :-D
Anyway, for this purpose you only have to learn one idiom, that
longstring.splitonchars (["x", "y", "z"])
is spelled
import re re.split ("[xyz]", longstring)
In fact, I personally would like to deprecate the with-argument implementation of string.split(), and have
def split (self, delimiter = None): if delimiters is None: return self.usual_magic_splitting () else: import re return re.split (delimiter, self)
(of course, that's because that's precisely the way split-string works in Emacs).
Then the idiom would be
longstring.split ("[xyz]")
Would that work for you?
Wouldn't that subtly break the code of everyone who has written something like: lines = bigtext.splitlines() delimiter = lines[0] del lines[0] splitlines = [line.split(delimiter) for line in lines] ? Since suddenly if your delimiter uses one of the reserved regexp characters, such as brackets and parentheses, the code would stop working. (That's one of the things I dislike about regexps -- too many magical characters.) Here's a backward compatible idea instead: def split (self, delimiter = None): if delimiter is None: return self.usual_magic_splitting () elif isinstance(delimiter, str): return self.usual_delimiter_based_splitting() elif isinstance(delimiter, Sequence): return self.treat_delimiters_given_by_sequence_as_interchangable() else: raise TypeError("coercing to Unicode: need string or buffer or Sequence, " + repr(type(delimiter)) + " found") Since right now passing a list or tuple raises a TypeError, this would be backwards compatible. The idiom for doing re.split-like things would then be bigtext.split(list(" ;.,-!?")). It might even be a good idea to a keyword (only?) argument called "dropempty" to recreate the magical behavior of passing None as the delimiter where empty strings are dropped. That would also solve skip's original problem: just set it to text.split(None, dropempty=False). -- Carl