string.split

Tim Peters tim.one at home.com
Wed Sep 5 04:07:26 EDT 2001


[Tom Harris]
> The useful split() function in the string module by default splits on
> whitespace, which is any combination of spaces, tabs, newlines,
> and possibly other stuff. So I see the following behaviour:
>
> >>> import string
> >>> s = 'asa   \n  bb cc\n\tdd'
> >>> string.split(s)
> ['asa', 'bb', 'cc', 'dd']
>
> However the defaulted second argument to string.split() can only
> be used to set a literal string that is the seperator. How do I split
> on any or all occurrences of (for example) whitespace and a comma,
> without using regexes.

Sorry, you don't -- or you do it by hand.

> I mean string.split() must be able to do it anyway to achieve the
> default behaviour.

If you look at stropmodule.c, you'll discover that the split function
special-cases the snot out of the "no argument" case, in that case calling
an entirely separate split_whitespace function that hardcodes the logic
needed to split on runs of whatever C's isspace() macro considers to be
whitespace.  So there's nothing general about it.

which-isn't-to-say-nothing-generalizable-ly y'rs  - tim





More information about the Python-list mailing list