how to avoid leading white spaces

rurpy at yahoo.com rurpy at yahoo.com
Mon Jun 6 01:47:13 EDT 2011


On 06/03/2011 03:45 PM, Chris Torek wrote:
>>On 2011-06-03, rurpy at yahoo.com <rurpy at yahoo.com> wrote:
> [prefers]
>>>     re.split ('[ ,]', source)
>
> This is probably not what you want in dealing with
> human-created text:
>
>     >>> re.split('[ ,]', 'foo bar, spam,maps')
>     ['foo', '', 'bar', '', 'spam', 'maps']
>
> Instead, you probably want "a comma followed by zero or
> more spaces; or, one or more spaces":
>
>     >>> re.split(r',\s*|\s+', 'foo bar, spam,maps')
>     ['foo', 'bar', 'spam', 'maps']
>
> or perhaps (depending on how you want to treat multiple
> adjacent commas) even this:
>
>     >>> re.split(r',+\s*|\s+', 'foo bar, spam,maps,, eggs')
>     ['foo', 'bar', 'spam', 'maps', 'eggs']

Which to me, illustrates nicely the power of a regex to concisely
localize the specification of an input format and adapt easily
to changes in that specification.

> although eventually you might want to just give in and use the
> csv module. :-)  (Especially if you want to be able to quote
> commas, for instance.)

Which internally uses regexes, at least for the sniffer function.
(The main parser is in C presumably for speed, this being a
library module and all.)

>>> ...  With regexes the code is likely to be less brittle than a
>>> dozen or more lines of mixed string functions, indexes, and
>>> conditionals.
>
> In article <94svm4Fe7eU1 at mid.individual.net>
> Neil Cerutti  <neilc at norwich.edu> wrote:
> [lots of snippage]
>>That is the opposite of my experience, but YMMV.
>
> I suspect it depends on how familiar the user is with regular
> expressions, their abilities, and their limitations.

I suspect so too at least in part.

> People relatively new to REs always seem to want to use them
> to count (to balance parentheses, for instance).  People who
> have gone through the compiler course know better. :-)

But also, a thing I think sometimes gets forgotten, is if the
max nesting depth is finite, parens can be balanced with a
regex.  This is nice for the particularly common case of a
nest depth of 1 (balanced but non-nested parens.)



More information about the Python-list mailing list