[Python-ideas] Proposal: Tuple of str with w'list of words'

Sat Nov 12 14:00:12 EST 2016

On Sat, Nov 12, 2016 at 12:06 PM Steven D'Aprano <steve at pearwood.info>
wrote:

> I consider the need for that to indicate a possibly poor design of
> pandas. Unless there is a good reason not to, I believe that any
> function that requires a list of strings should also accept a single
> space-delimited string instead. Especially if the strings are intended
> as names or labels. So that:
>
> func(['fe', 'fi', 'fo', 'fum'])
>
> and
>
> func('fe fi fo fum')
>
> should be treated the same way.
>

They don't because df[ 'Column Name'] is a valid way to get a single column
worth of data when the column name contains spaces (not encouraged, but it
is valid).

> > mydf = df[ ('field1', 'field2', 'field3') ]
>
> Are your field names usually constants known when you write the script?
>

Yes.  All the time.  When I'm on the side of creating APIs for data
analysts to use, I think of the columns abstractly.  When they're writing
scripts to analyze data, it's all very explicit and in the domain of the
data. Things like:

df [df.age > 10]
adf = df.pivot_table( ['runid','block'] )

Are common and the "right" way to do things in the problem domain.

> So not only do we have to learn yet another special kind of string:
>
> - unicode strings
> - byte strings
> - raw strings (either unicode or bytes)
> - f-strings
> - and now w-strings
>

Very valid point.  I also was considering (and rejected) a 'wb' for tuple
of bytes.

> I would prefer a simple, straight-forward rule: it unconditionally
> splits on whitespace. If you need to include non-splitting spaces, use a
> proper non-breaking space \u00A0, or split the words into a tuple by
> hand, like you're doing now. I don't think it is worth complicating the
> feature to support non-splitting spaces.
>

You're right there.  If there are spaces in the columns, make it explicit
and don't use the w''.  I withdraw the <backspace><space> "feature".  And I
think you're right that all the existing escape rules should work in the
same way they do for regular unicode strings (don't go the raw strings
route).  Basically, w'foo bar' == tuple('foo bar'.split())

> The fact that other languages do something like this is a (weak) point
> in its favour. But I see that there are a few questions on Stackoverflow
> asking what %w
> means, how it is different from %W, etc. For example:
>
> http://stackoverflow.com/questions/1274675/what-does-warray-mean
>
> http://stackoverflow.com/questions/690794/ruby-arrays-w-vs-w
>
>
Well, I'd lean towards not having a W'fields' that does something funky
:-).   But your point is well taken.

> ...
> I'm rather luke-warm on this proposal, although I might be convinced to
> support it if:
>
> - w'...' unconditionally split on any whitespace (possibly
>   excluding NBSP);
>
> - and normal escapes worked.
>
> Even then I'm not really convinced this needs to be a language feature.
>
>

I'm realizing that a lot of the reason that I'm seeing this a lot is that
it seems to be particular issue to using python for data science.  In some
ways, they're pushing the language a bit beyond what it's designed to do
(the df[ (df.age > 10) & (df.gender=="F")] idiom is amazing and
troubling).  Since I'm doing a lot of this, these little language issues
loom a bit larger than they would with "normal" programming.

Thanks for responding.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161112/79f7b75f/attachment-0001.html>