how to avoid leading white spaces
Steven D'Aprano
steve+comp.lang.python at
Fri Jun 3 10:25:53 EDT 2011
On Fri, 03 Jun 2011 05:51:18 -0700, rurpy at wrote:
> On 06/02/2011 07:21 AM, Neil Cerutti wrote:
>> > Python's str methods, when they're sufficent, are usually more
>> > efficient.
> Unfortunately, except for the very simplest cases, they are often not
> sufficient.
Maybe so, but the very simplest cases occur very frequently.
> I often find myself changing, for example, a startwith() to
> a RE when I realize that the input can contain mixed case
Why wouldn't you just normalise the case?
Particularly if the two strings are short, this is likely to be much
faster than a regex.
Admittedly, normalising the case in this fashion is not strictly correct.
It works well enough for ASCII text, and probably Latin-1, but for
general Unicode, not so much. But neither will a regex solution. If you
need to support true case normalisation for arbitrary character sets,
Python isn't going to be much help for you. But for the rest of us, a
simple str.lower() or str.upper() might be technically broken but it will
do the job.
> or that I have
> to treat commas as well as spaces as delimiters.
source.replace(",", " ").split(" ")
[steve at sylar ~]$ python -m timeit -s "source = 'a b c,d,e,f,g h i j k'"
"source.replace(',', ' ').split(' ')"
100000 loops, best of 3: 2.69 usec per loop
[steve at sylar ~]$ python -m timeit -s "source = 'a b c,d,e,f,g h i j k'" -
s "import re" "re.split(',| ', source)"
100000 loops, best of 3: 11.8 usec per loop
re.split is about four times slower than the simple solution.
> After doing this a
> number of times, one starts to use an RE right from the get go unless
> one is VERY sure that there will be no requirements creep.
There's no need to use a regex just because you think that you *might*,
someday, possibly need a regex. That's just silly. If and when
requirements change, then use a regex. Until then, write the simplest
code that will solve the problem you have to solve now, not the problem
you think you might have to solve later.
> And to regurgitate the mantra frequently used to defend Python when it
> is criticized for being slow, the real question should be, are REs fast
> enough? The answer almost always is yes.
Well, perhaps so.
> In short, although your observations are true to some extent, they
> are not sufficient to justify the anti-RE attitude often seen here.
I don't think that there's really an *anti* RE attitude here. It's more a
skeptical, cautious attitude to them, as a reaction to the Perl "when all
you have is a hammer, everything looks like a nail" love affair with
There are a few problems with regexes:
- they are another language to learn, a very cryptic a terse language;
- hence code using many regexes tends to be obfuscated and brittle;
- they're over-kill for many simple tasks;
- and underpowered for complex jobs, and even some simple ones;
- debugging regexes is a nightmare;
- they're relatively slow;
- and thanks in part to Perl's over-reliance on them, there's a tendency
among many coders (especially those coming from Perl) to abuse and/or
misuse regexes; people react to that misuse by treating any use of
regexes with suspicion.
But they have their role to play as a tool in the programmers toolbox.
Regarding their syntax, I'd like to point out that even Larry Wall is
dissatisfied with regex culture in the Perl community:
More information about the Python-list
mailing list