Fastest way to calculate leading whitespace
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sat May 8 14:46:26 EDT 2010
On Sat, 08 May 2010 10:19:16 -0700, dasacc22 wrote:
> Hi
>
> This is a simple question. I'm looking for the fastest way to calculate
> the leading whitespace (as a string, ie ' ').
Is calculating the amount of leading whitespace really the bottleneck in
your application? If not, then trying to shave off microseconds from
something which is a trivial part of your app is almost certainly a waste
of your time.
[...]
> a = ' some content\n'
> b = a.strip()
> c = ' '*(len(a)-len(b))
I take it that you haven't actually tested this code for correctness,
because it's buggy. Let's test it:
>>> leading_whitespace = " "*2 + "\t"*2
>>> a = leading_whitespace + "some non-whitespace text\n"
>>> b = a.strip()
>>> c = " "*(len(a)-len(b))
>>> assert c == leading_whitespace
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
Not only doesn't it get the whitespace right, but it doesn't even get the
*amount* of whitespace right:
>>> assert len(c) == len(leading_whitespace)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
It doesn't even work correctly if you limit "whitespace" to mean spaces
and nothing else! It's simply wrong in every possible way.
This is why people say that premature optimization is the root of all
(programming) evil. Instead of wasting time and energy trying to optimise
code, you should make it correct first.
Your solutions 2 and 3 are also buggy. And solution 3 can be easily re-
written to be more straightforward. Instead of the complicated:
> def get_leading_whitespace(s):
> def _get():
> for x in s:
> if x != ' ':
> break
> yield x
> return ''.join(_get())
try this version:
def get_leading_whitespace(s):
accumulator = []
for c in s:
if c in ' \t\v\f\r\n':
accumulator.append(c)
else:
break
return ''.join(accumulator)
Once you're sure this is correct, then you can optimise it:
def get_leading_whitespace(s):
t = s.lstrip()
return s[:len(s)-len(t)]
>>> c = get_leading_whitespace(a)
>>> assert c == leading_whitespace
>>>
Unless your strings are very large, this is likely to be faster than any
other pure-Python solution you can come up with.
--
Steven
More information about the Python-list
mailing list