how to avoid leading white spaces
rurpy at yahoo.com
rurpy at yahoo.com
Mon Jun 6 01:47:13 EDT 2011
On 06/03/2011 03:45 PM, Chris Torek wrote:
>>On 2011-06-03, rurpy at yahoo.com <rurpy at yahoo.com> wrote:
> [prefers]
>>> re.split ('[ ,]', source)
>
> This is probably not what you want in dealing with
> human-created text:
>
> >>> re.split('[ ,]', 'foo bar, spam,maps')
> ['foo', '', 'bar', '', 'spam', 'maps']
>
> Instead, you probably want "a comma followed by zero or
> more spaces; or, one or more spaces":
>
> >>> re.split(r',\s*|\s+', 'foo bar, spam,maps')
> ['foo', 'bar', 'spam', 'maps']
>
> or perhaps (depending on how you want to treat multiple
> adjacent commas) even this:
>
> >>> re.split(r',+\s*|\s+', 'foo bar, spam,maps,, eggs')
> ['foo', 'bar', 'spam', 'maps', 'eggs']
Which to me, illustrates nicely the power of a regex to concisely
localize the specification of an input format and adapt easily
to changes in that specification.
> although eventually you might want to just give in and use the
> csv module. :-) (Especially if you want to be able to quote
> commas, for instance.)
Which internally uses regexes, at least for the sniffer function.
(The main parser is in C presumably for speed, this being a
library module and all.)
>>> ... With regexes the code is likely to be less brittle than a
>>> dozen or more lines of mixed string functions, indexes, and
>>> conditionals.
>
> In article <94svm4Fe7eU1 at mid.individual.net>
> Neil Cerutti <neilc at norwich.edu> wrote:
> [lots of snippage]
>>That is the opposite of my experience, but YMMV.
>
> I suspect it depends on how familiar the user is with regular
> expressions, their abilities, and their limitations.
I suspect so too at least in part.
> People relatively new to REs always seem to want to use them
> to count (to balance parentheses, for instance). People who
> have gone through the compiler course know better. :-)
But also, a thing I think sometimes gets forgotten, is if the
max nesting depth is finite, parens can be balanced with a
regex. This is nice for the particularly common case of a
nest depth of 1 (balanced but non-nested parens.)
More information about the Python-list
mailing list