On Sat, Nov 12, 2016 at 12:06 PM Steven D'Aprano
I consider the need for that to indicate a possibly poor design of pandas. Unless there is a good reason not to, I believe that any function that requires a list of strings should also accept a single space-delimited string instead. Especially if the strings are intended as names or labels. So that:
func(['fe', 'fi', 'fo', 'fum'])
and
func('fe fi fo fum')
should be treated the same way.
They don't because df[ 'Column Name'] is a valid way to get a single column worth of data when the column name contains spaces (not encouraged, but it is valid).
mydf = df[ ('field1', 'field2', 'field3') ]
Are your field names usually constants known when you write the script?
Yes. All the time. When I'm on the side of creating APIs for data analysts to use, I think of the columns abstractly. When they're writing scripts to analyze data, it's all very explicit and in the domain of the data. Things like: df [df.age > 10] adf = df.pivot_table( ['runid','block'] ) Are common and the "right" way to do things in the problem domain.
So not only do we have to learn yet another special kind of string:
- unicode strings - byte strings - raw strings (either unicode or bytes) - f-strings - and now w-strings
Very valid point. I also was considering (and rejected) a 'wb' for tuple of bytes.
I would prefer a simple, straight-forward rule: it unconditionally splits on whitespace. If you need to include non-splitting spaces, use a proper non-breaking space \u00A0, or split the words into a tuple by hand, like you're doing now. I don't think it is worth complicating the feature to support non-splitting spaces.
You're right there. If there are spaces in the columns, make it explicit and don't use the w''. I withdraw the <backspace><space> "feature". And I think you're right that all the existing escape rules should work in the same way they do for regular unicode strings (don't go the raw strings route). Basically, w'foo bar' == tuple('foo bar'.split())
The fact that other languages do something like this is a (weak) point in its favour. But I see that there are a few questions on Stackoverflow asking what %w means, how it is different from %W, etc. For example:
http://stackoverflow.com/questions/1274675/what-does-warray-mean
http://stackoverflow.com/questions/690794/ruby-arrays-w-vs-w
Well, I'd lean towards not having a W'fields' that does something funky :-). But your point is well taken.
... I'm rather luke-warm on this proposal, although I might be convinced to support it if:
- w'...' unconditionally split on any whitespace (possibly excluding NBSP);
- and normal escapes worked.
Even then I'm not really convinced this needs to be a language feature.
I'm realizing that a lot of the reason that I'm seeing this a lot is that it seems to be particular issue to using python for data science. In some ways, they're pushing the language a bit beyond what it's designed to do (the df[ (df.age > 10) & (df.gender=="F")] idiom is amazing and troubling). Since I'm doing a lot of this, these little language issues loom a bit larger than they would with "normal" programming. Thanks for responding.