Fastest way to calculate leading whitespace

dasacc22 dasacc22 at gmail.com
Sat May 8 14:16:49 EDT 2010


On May 8, 12:59 pm, Patrick Maupin <pmau... at gmail.com> wrote:
> On May 8, 12:19 pm, dasacc22 <dasac... at gmail.com> wrote:
>
>
>
>
>
> > Hi
>
> > This is a simple question. I'm looking for the fastest way to
> > calculate the leading whitespace (as a string, ie '    ').
>
> > Here are some different methods I have tried so far
> > --- solution 1
>
> > a = '    some content\n'
> > b = a.strip()
> > c = ' '*(len(a)-len(b))
>
> > --- solution 2
>
> > a = '    some content\n'
> > b = a.strip()
> > c = a.partition(b[0])[0]
>
> > --- solution 3
>
> > def get_leading_whitespace(s):
> >     def _get():
> >         for x in s:
> >             if x != ' ':
> >                 break
> >             yield x
> >     return ''.join(_get())
>
> > ---
>
> > Solution 1 seems to be about as fast as solution 2 except in certain
> > circumstances where the value of b has already been determined for
> > other purposes. Solution 3 is slower due to the function overhead.
>
> > Curious to see what other types of solutions people might have.
>
> > Thanks,
> > Daniel
>
> Well, you could try a solution using re, but that's probably only
> likely to be faster if you can use it on multiple concatenated lines.
> I usually use something like your solution #1.  One thing to be aware
> of, though, is that strip() with no parameters will strip *any*
> whitespace, not just spaces, so the implicit assumption in your code
> that what you have stripped is spaces may not be justified (depending
> on the source data).  OTOH, depending on how you use that whitespace
> information, it may not really matter.  But if it does matter, you can
> use strip(' ')
>
> If speed is really an issue for you, you could also investigate
> mxtexttools, but, like re, it might perform better if the source
> consists of several batched lines.
>
> Regards,
> Pat

Hi,

thanks for the info. Using .strip() to remove all whitespace in
solution 1 is a must. If you only stripped ' ' spaces then line
endings would get counted in the len() call and when multiplied
against ' ', would produce an inaccurate result. Regex is
significantly slower for my purposes but ive never heard of
mxtexttools. Even if it proves slow its spurred my curiousity as to
what functionality it provides (on an unrelated note)



More information about the Python-list mailing list