On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin email@example.com wrote:
On 8 September 2013 18:32, Guido van Rossum firstname.lastname@example.org wrote:
Going over the open issues:
- Parallel arrays or arrays of tuples? I think the API should require
an array of tuples. It is trivial to zip up parallel arrays to the required format, while if you have an array of tuples, extracting the parallel arrays is slightly more cumbersome. Also for manipulating of the raw data, an array of tuples makes it easier to do insertions or removals without worrying about losing the correspondence between the arrays.
For something like this, where there are multiple obvious formats for the input data, I think it's reasonable to just request whatever is convenient for the implementation.
Not really. The implementation may change, or its needs may not be obvious to the caller. I would say the right thing to do is request something easy to remember, which often means consistent. In general, Python APIs definitely skew towards lists of tuples rather than parallel arrays, and for good reasons -- that way you benefit most from built-in operations like slices and insert/append.
Otherwise you're asking at least some of your users to convert data from one format to another just so that you can convert it back again. In any real problem you'll likely have more than two variables, so you'll be writing some code to prepare the data for the function anyway.
Yeah, so you might as well prepare it in the form that the API expects.
The most obvious alternative that isn't explicitly mentioned in the PEP is to accept either:
def correlation(x, y=None): if y is None: xs =  ys =  for x, y in x: xs.append(x) ys.append(y) else: xs = list(x) ys = list(y) assert len(xs) == len(ys) # In reality a helper function does the above. # Now compute stuff
This avoids any unnecessary conversions and is as convenient as possible for all users at the expense of having a slightly more complicated API.
I don't think this is really more convenient -- it is more to learn, and can cause surprises (e.g. when a user is only familiar with one format and then sees an example using the other format, they may be unable to understand the example).
The one argument I *haven't* heard yet which *might* sway me would be something along the line "every other statistics package that users might be familiar with does it this way" or "all the statistics textbooks do it this way". (Because, frankly, when it comes to statistics I'm a rank amateur and I really want Steven's new module to educate me as much as help me compute specific statistical functions.)