On Sat, Nov 12, 2016 at 12:06 PM Steven D'Aprano <steve@pearwood.info> wrote:
I consider the need for that to indicate a possibly poor design of
pandas. Unless there is a good reason not to, I believe that any
function that requires a list of strings should also accept a single
space-delimited string instead. Especially if the strings are intended
as names or labels. So that:

func(['fe', 'fi', 'fo', 'fum'])

and

func('fe fi fo fum')

should be treated the same way.

They don't because df[ 'Column Name'] is a valid way to get a single column worth of data when the column name contains spaces (not encouraged, but it is valid).
 
> mydf = df[ ('field1', 'field2', 'field3') ]

Are your field names usually constants known when you write the script?

Yes.  All the time.  When I'm on the side of creating APIs for data analysts to use, I think of the columns abstractly.  When they're writing scripts to analyze data, it's all very explicit and in the domain of the data. Things like:

df [df.age > 10]
adf = df.pivot_table( ['runid','block'] )

Are common and the "right" way to do things in the problem domain.
 
So not only do we have to learn yet another special kind of string:

- unicode strings
- byte strings
- raw strings (either unicode or bytes)
- f-strings
- and now w-strings

Very valid point.  I also was considering (and rejected) a 'wb' for tuple of bytes. 


I would prefer a simple, straight-forward rule: it unconditionally
splits on whitespace. If you need to include non-splitting spaces, use a
proper non-breaking space \u00A0, or split the words into a tuple by
hand, like you're doing now. I don't think it is worth complicating the
feature to support non-splitting spaces.

You're right there.  If there are spaces in the columns, make it explicit and don't use the w''.  I withdraw the <backspace><space> "feature".  And I think you're right that all the existing escape rules should work in the same way they do for regular unicode strings (don't go the raw strings route).  Basically, w'foo bar' == tuple('foo bar'.split())
 
The fact that other languages do something like this is a (weak) point
in its favour. But I see that there are a few questions on Stackoverflow asking what %w
means, how it is different from %W, etc. For example:

http://stackoverflow.com/questions/1274675/what-does-warray-mean

http://stackoverflow.com/questions/690794/ruby-arrays-w-vs-w


Well, I'd lean towards not having a W'fields' that does something funky :-).   But your point is well taken.
 
...
I'm rather luke-warm on this proposal, although I might be convinced to
support it if:

- w'...' unconditionally split on any whitespace (possibly
  excluding NBSP);

- and normal escapes worked.

Even then I'm not really convinced this needs to be a language feature.



I'm realizing that a lot of the reason that I'm seeing this a lot is that it seems to be particular issue to using python for data science.  In some ways, they're pushing the language a bit beyond what it's designed to do (the df[ (df.age > 10) & (df.gender=="F")] idiom is amazing and troubling).  Since I'm doing a lot of this, these little language issues loom a bit larger than they would with "normal" programming.

Thanks for responding.