[Numpy-discussion] Missing Values Discussion

Bruce Southey bsouthey at gmail.com
Mon Jul 11 10:16:06 EDT 2011


On 07/11/2011 08:08 AM, Matthew Brett wrote:
> Hi,
>
> On Mon, Jul 11, 2011 at 3:52 AM, Bruce Southey<bsouthey at gmail.com>  wrote:
>> On Fri, Jul 8, 2011 at 4:35 PM, Matthew Brett<matthew.brett at gmail.com>  wrote:
>>> Hi,
>>>
>>> On Fri, Jul 8, 2011 at 8:34 PM, Bruce Southey<bsouthey at gmail.com>  wrote:
>>>> On Fri, Jul 8, 2011 at 12:55 PM, Matthew Brett<matthew.brett at gmail.com>  wrote:
>>>>> Hi,
>>>>>
>>>>> On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey<bsouthey at gmail.com>  wrote:
>>>>>> On 07/08/2011 08:58 AM, Matthew Brett wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Just checking - but is this:
>>>>>>>
>>>>>>> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey<bsouthey at gmail.com>    wrote:
>>>>>>> ...
>>>>>>>> The one thing that we do need now is the code that implements the small
>>>>>>>> set of core ideas (array creation and simple numerical operations).
>>>>>>>> Hopefully that will provide a better grasp of the concepts and the
>>>>>>>> performance differences to determine the acceptability of the approach(es).
>>>>>>> in reference to this:
>>>>>>>
>>>>>>>> On 07/08/2011 07:15 AM, Matthew Brett wrote:
>>>>>>> ...
>>>>>>>>> Can I ask - what do you recommend that we do now, for the discussion?
>>>>>>>>> Should we be quiet and wait until there is code to test, or, as
>>>>>>>>> Nathaniel has tried to do, work at reaching some compromise that makes
>>>>>>>>> sense to some or all parties?
>>>>>>> ?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Matthew
>>>>>> Simply, I think the time for discussion has passed and it is now time to
>>>>>> see the 'cards'. I do not know enough (or anything) about the
>>>>>> implementation so I need code to know the actual 'cost' of Mark's idea
>>>>>> with real situations.
>>>>> Yes, I thought that was what you were saying.
>>>>>
>>>>> I disagree and think that discussion of the type that Nathaniel has
>>>>> started is a useful way to think more clearly and specifically about
>>>>> the API and what can be agreed.
>>>>>
>>>>> Otherwise we will come to the same impasse when Mark's code arrives.
>>>>> If that happens, we'll either lose the code because the merge is
>>>>> refused, or be forced into something that may not be the best way
>>>>> forward.
>>>>>
>>>>> Best,
>>>>>
>>>>> Matthew
>>>>> _______________________________________________
>>>>
>>>> Unfortunately we need code from either side as an API etc. is not
>>>> sufficient to judge anything.
>>> If I understand correctly, we are not going to get code from either
>>> side, we are only going to get code from one side.
>> The would be very unfortunate indeed.
>>
>>> I cannot now see how the code will inform the discussion about the
>>> API, unless it turns out that the proposed API cannot be implemented.
>>>   The substantial points are not about memory use or performance, but
>>> about how the API should work.  If you can see some way that the code
>>> will inform the discussion, please say, I would honestly be grateful.
>> API's are not my area or even a concern.  I am an end user so the code
>> has to work correctly with acceptable performance and memory usage. To
>> that end I have know if doing a+b is faster with less memory than
>> first creating new arrays c and d without missing values then doing
>> c+d. The limited understanding with the masked approach is that the
>> former it should be faster than the latter with some acceptable
>> increase in memory usage. With the miniNEP approach, I do not see that
>> there will be benefits because the function will have to find these
>> and handle them appropriately which may be a 'killer' for integer
>> arrays.
>>
>>>> But I do not think we will be forced
>>>> into anything as in the extreme situation you can keep old versions or
>>>> fork the code in the really extreme case.
>>> That would be a terrible waste, and potentially damaging to the
>>> community, so of course we want to do all we can to avoid those
>>> outcomes.
>>>
>>> Best,
>>>
>>> Matthew
>> So I have to support anybody that wants to try a new change especially
>> one that would remove my 'bane' of having functions automatically
>> handle masked arrays.
> This is a very important statement, and it is right at the heart of
> the problem that I have been trying to raise.
>
> Here what you are saying is "I want functions to handle masked arrays"
> and so "I support a change to handle masked arrays".
>
> However, you are replying on another discussion which is "What is the
> right API to handle masked arrays in relationship to missing values".
>   Specifically you are saying you think discussion should stop on that
> until the masking implementation is done.
>
> My point is this:
>
> 1) We must make sure that we discuss the substance of the actual point.
> 2) In order to do this, we must be very careful to separate the actual
> point from
> A) Desire for our own favorite use-case
> B) General expressions of personal solidarity.
>
> If we don't then what we will see is considerable confusion in the
> discussion, and the destructive formation of cliques.
>
> We're scientists - and so we know better than most about the
> importance of keeping the ideas separate from the people making them.
> If we want to have clear ideas that will help numpy last as a tool, we
> need to preserve the quality of our discussion.
>
> Best,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Just to correct you, my position is 'show me the code' not that I 
support any idea. As you probably can tell, I do have a hard time 
understanding how either approach will actually work. By having basic 
code that implements very basic functionality, I, and probably others, 
will better appreciate what people are referring to and what is the cost 
in performance and usage.

Bruce




More information about the NumPy-Discussion mailing list