Re: [Python-ideas] This seems like a wart to me...

Dec. 12, 2008

      Stephen J. Turnbull wrote:
...
I don't understand this point of view at all.  True, regexps are a
complex subject, with an unfortunately large number of dialects.  Is
it the confusion of dialects problem, or do you really never use
regexps in any language?
I have half-heartedly tried to learn regexps before, but always given  
up after reading about the basics. Obviously, this would be shameless  
behavior for a professional programmer, but I'm just a dilettante, and  
the famed saying of Jamie Zawinski ("Some people, when confronted with  
a problem, think 'I know, I'll use regular expressions.'  Now they  
have two problems.") is not highly motivating. :-D
...
Anyway, for this purpose you only have to learn one idiom, that
longstring.splitonchars (["x", "y", "z"])
is spelled
import re
   re.split ("[xyz]", longstring)
In fact, I personally would like to deprecate the with-argument
implementation of string.split(), and have
def split (self, delimiter = None):
       if delimiters is None:
           return self.usual_magic_splitting ()
       else:
           import re
           return re.split (delimiter, self)
(of course, that's because that's precisely the way split-string works
in Emacs).
Then the idiom would be
longstring.split ("[xyz]")
Would that work for you?
Wouldn't that subtly break the code of everyone who has written  
something like:

lines = bigtext.splitlines()
delimiter = lines[0]
del lines[0]
splitlines = [line.split(delimiter) for line in lines]

? Since suddenly if your delimiter uses one of the reserved regexp  
characters, such as brackets and parentheses, the code would stop  
working. (That's one of the things I dislike about regexps -- too many  
magical characters.)

Here's a backward compatible idea instead:

    def split (self, delimiter = None):
        if delimiter is None:
            return self.usual_magic_splitting ()
        elif isinstance(delimiter, str):
            return self.usual_delimiter_based_splitting()
        elif isinstance(delimiter, Sequence):
            return  
self.treat_delimiters_given_by_sequence_as_interchangable()
        else:
            raise TypeError("coercing to Unicode: need string or  
buffer or Sequence, " + repr(type(delimiter)) + " found")

Since right now passing a list or tuple raises a TypeError, this would  
be backwards compatible. The idiom for doing re.split-like things  
would then be bigtext.split(list(" ;.,-!?")). It might even be a good  
idea to a keyword (only?) argument called "dropempty" to recreate the  
magical behavior of passing None as the delimiter where empty strings  
are dropped. That would also solve skip's original problem: just set  
it to text.split(None, dropempty=False).

-- Carl

Re: [Python-ideas] This seems like a wart to me...

Carl Johnson