how to avoid leading white spaces

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Jun 3 10:25:53 EDT 2011


On Fri, 03 Jun 2011 05:51:18 -0700, rurpy at yahoo.com wrote:

> On 06/02/2011 07:21 AM, Neil Cerutti wrote:

>> > Python's str methods, when they're sufficent, are usually more
>> > efficient.
> 
> Unfortunately, except for the very simplest cases, they are often not
> sufficient.

Maybe so, but the very simplest cases occur very frequently.


> I often find myself changing, for example, a startwith() to
> a RE when I realize that the input can contain mixed case 

Why wouldn't you just normalise the case?

source.lower().startswith(prefix.lower())

Particularly if the two strings are short, this is likely to be much 
faster than a regex.

Admittedly, normalising the case in this fashion is not strictly correct. 
It works well enough for ASCII text, and probably Latin-1, but for 
general Unicode, not so much. But neither will a regex solution. If you 
need to support true case normalisation for arbitrary character sets, 
Python isn't going to be much help for you. But for the rest of us, a 
simple str.lower() or str.upper() might be technically broken but it will 
do the job.


> or that I have
> to treat commas as well as spaces as delimiters.

source.replace(",", " ").split(" ")

[steve at sylar ~]$ python -m timeit -s "source = 'a b c,d,e,f,g h i j k'" 
"source.replace(',', ' ').split(' ')"
100000 loops, best of 3: 2.69 usec per loop

[steve at sylar ~]$ python -m timeit -s "source = 'a b c,d,e,f,g h i j k'" -
s "import re" "re.split(',| ', source)"
100000 loops, best of 3: 11.8 usec per loop

re.split is about four times slower than the simple solution.


> After doing this a
> number of times, one starts to use an RE right from the get go unless
> one is VERY sure that there will be no requirements creep.

YAGNI.

There's no need to use a regex just because you think that you *might*, 
someday, possibly need a regex. That's just silly. If and when 
requirements change, then use a regex. Until then, write the simplest 
code that will solve the problem you have to solve now, not the problem 
you think you might have to solve later.


> And to regurgitate the mantra frequently used to defend Python when it
> is criticized for being slow, the real question should be, are REs fast
> enough?  The answer almost always is yes.

Well, perhaps so.



[...]
> In short, although your observations are true to some extent, they
> are not sufficient to justify the anti-RE attitude often seen here.

I don't think that there's really an *anti* RE attitude here. It's more a 
skeptical, cautious attitude to them, as a reaction to the Perl "when all 
you have is a hammer, everything looks like a nail" love affair with 
regexes.

There are a few problems with regexes:

- they are another language to learn, a very cryptic a terse language;
- hence code using many regexes tends to be obfuscated and brittle;
- they're over-kill for many simple tasks;
- and underpowered for complex jobs, and even some simple ones;
- debugging regexes is a nightmare;
- they're relatively slow;
- and thanks in part to Perl's over-reliance on them, there's a tendency 
among many coders (especially those coming from Perl) to abuse and/or 
misuse regexes; people react to that misuse by treating any use of 
regexes with suspicion.

But they have their role to play as a tool in the programmers toolbox.

Regarding their syntax, I'd like to point out that even Larry Wall is 
dissatisfied with regex culture in the Perl community:

http://www.perl.com/pub/2002/06/04/apo5.html



-- 
Steven



More information about the Python-list mailing list