[Python-ideas] This seems like a wart to me...

Stephen J. Turnbull stephen at xemacs.org
Sat Dec 13 16:48:00 CET 2008


Mike Meyer writes:

 > Except that if you're doing anything *interesting* - like splitting on
 > punctuation, which is far more common than splitting on alphanumerics
 > - then it's not nearly that simpl.  The equivalent of
 > .splitset("^()-[]") is *much* more complicated than just "[^()-[]]"

Yeah, it's "[][)(^-]".  So much for complexity of the needed regexp
(see below for difficulty of composition).

 > > So I don't think that lack of diagnostics explains widespread reluctance
 > > to even substitute ".*" for "*", but instead propose something as ugly
 > > as .split(list("abc")).
 > 
 > It isn't the lack of diagnostics, it's the write-once nature of re's.

They're hardly write-once in this context.  The above regexp is hard
to write, agreed, because you have to remember to move the close
bracket to the start, the hyphen to the end, and the caret away from
the start.  (Note that there is never a need to put the close bracket
and hyphen in other positions, so this is not a particularly hard rule
to remember IMO YMMV.)  However, precisely because of the oddity of
the positions of the close bracket and hyphen it's easy enough to read
once you've learned to write it.  As far as I can see, that is the
hardest regexp that most people will ever want to write for
re.split().

Again, I just don't see that (limited) use of regular expressions
makes programs harder to read or write than proliferating special case
functions that provide nowhere near the power of a single regular
expression-based function.




More information about the Python-ideas mailing list