Fastest way to calculate leading whitespace
dasacc22
dasacc22 at gmail.com
Sat May 8 17:27:32 EDT 2010
U presume entirely to much. I have a preprocessor that normalizes
documents while performing other more complex operations. Theres
nothing buggy about what im doing
On May 8, 1:46 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Sat, 08 May 2010 10:19:16 -0700, dasacc22 wrote:
> > Hi
>
> > This is a simple question. I'm looking for the fastest way to calculate
> > the leading whitespace (as a string, ie ' ').
>
> Is calculating the amount of leading whitespace really the bottleneck in
> your application? If not, then trying to shave off microseconds from
> something which is a trivial part of your app is almost certainly a waste
> of your time.
>
> [...]
>
> > a = ' some content\n'
> > b = a.strip()
> > c = ' '*(len(a)-len(b))
>
> I take it that you haven't actually tested this code for correctness,
> because it's buggy. Let's test it:
>
> >>> leading_whitespace = " "*2 + "\t"*2
> >>> a = leading_whitespace + "some non-whitespace text\n"
> >>> b = a.strip()
> >>> c = " "*(len(a)-len(b))
> >>> assert c == leading_whitespace
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AssertionError
>
> Not only doesn't it get the whitespace right, but it doesn't even get the
> *amount* of whitespace right:
>
> >>> assert len(c) == len(leading_whitespace)
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AssertionError
>
> It doesn't even work correctly if you limit "whitespace" to mean spaces
> and nothing else! It's simply wrong in every possible way.
>
> This is why people say that premature optimization is the root of all
> (programming) evil. Instead of wasting time and energy trying to optimise
> code, you should make it correct first.
>
> Your solutions 2 and 3 are also buggy. And solution 3 can be easily re-
> written to be more straightforward. Instead of the complicated:
>
> > def get_leading_whitespace(s):
> > def _get():
> > for x in s:
> > if x != ' ':
> > break
> > yield x
> > return ''.join(_get())
>
> try this version:
>
> def get_leading_whitespace(s):
> accumulator = []
> for c in s:
> if c in ' \t\v\f\r\n':
> accumulator.append(c)
> else:
> break
> return ''.join(accumulator)
>
> Once you're sure this is correct, then you can optimise it:
>
> def get_leading_whitespace(s):
> t = s.lstrip()
> return s[:len(s)-len(t)]
>
> >>> c = get_leading_whitespace(a)
> >>> assert c == leading_whitespace
>
> Unless your strings are very large, this is likely to be faster than any
> other pure-Python solution you can come up with.
>
> --
> Steven
More information about the Python-list
mailing list