Mailman 3 February 2014 - Python-ideas

Re: [Python-ideas] statistics module in Python3.4
by Steven D'Aprano Feb. 1, 2014

Feb. 1, 2014

On Thu, Jan 30, 2014 at 11:03:38AM -0800, Larry Hastings wrote: > On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang > <wolfgang.maier(a)biologie.uni-freiburg.de > <mailto:wolfgang.maier@biologie.uni-freiburg.de>> wrote: > >I think a much cleaner (and probably faster) implementation would be > >to gather first all the types in the input sequence, then decide what > >to return in an input order independent way. > > I'm willing to consider this a "bug fix".… [View More]

7 10

Re: [Python-ideas] statistics module in Python3.4
by Wolfgang Maier Feb. 1, 2014

Feb. 1, 2014

Oscar Benjamin <oscar.j.benjamin@...> writes: Hi Oscar, and thanks for this very detailed post. > > You're making this sound a lot more complicated than it is. The > problem is simple: Decimal doesn't integrate with the numeric tower. > This is explicit in the PEP that brought in the numeric tower: > http://www.python.org/dev/peps/pep-3141/#the-decimal-type > You're perfectly right about this as far as built-in number types and the standard library types Fraction … [View More]and Decimal are concerned. > That being said I think that guaranteeing an error is > better than the current order-dependent behaviour (and agree that that > should be considered a bug). > For custom types, the type returned by _sum can also be order-dependent due to this part in _coerce-types: def _coerce_types(T1, T2): [..] if issubclass(T2, float): return T2 if issubclass(T1, float): return T1 # Subclasses of the same base class give priority to the second. if T1.__base__ is T2.__base__: return T2 I chose the more drastic example with Fraction and Decimal for my initial post because there the difference is between a result and an error, but the above may illustrate better why I said that the returned type of _sum is hard to predict. > If there is to be a more drastic rearrangement of the _sum function > then it should actually be to solve the problem that the current > implementation of mean, variance etc. uses Fractions for all the heavy > lifting but then rounds in the wrong place (when returning from > _sum()) rather than in the mean, variance function itself. > This is an excellent remark and I agree absolutely with your point here. It's one of the aspects of the statistics module that I pondered over for weeks. Essentially, the fact that all current functions that rely on _sum do round imprecisely anyway was my motivation for suggesting the simple: def _coerce_types (types): if len(types) == 1: return next(iter(types)) return float because it certainly makes sense to return the type found in the input if there is only one, but with ambiguity, why make the effort of guessing when it does not help precision anyway. However, I realized that I probably rushed this because the implementation of functions that call _sum may change later to rely on an exact return value. > The clever algorithm in the variance function (unless it changed since > I last looked) is entirely unnecessary when all of the intensive > computation is performed with exact arithmetic. In the absence of > rounding error you could compute a perfectly good variance using the > computational formula for variance in a single pass. Similarly > although the _sum() function is correctly rounded, the mean() function > calls _sum() and then rounds again so that the return value from > mean() is rounded twice. _sum() computes an exact value as a fraction > and then coerces it with > > return T(total_numerator) / total_denominator > > so that the division causes it to be correctly rounded. However the > mean function effectively ends up doing > > return (T(total_numerator) / total_denominator) / num_items > > which uses 2 divisions and hence rounds twice. It's trivial to > rearrange that so that you round once > > return T(total_numerator) / (total_denominator * num_items) > > except that to do this the _sum function should be changed to return > the exact result as a Fraction (and perhaps the type T). Similar > changes would need to be made to the some of squares function (_ss() > IIRC). The double rounding in mean() isn't a big deal but the > corresponding effect for the variance functions is significant. It was > after realising this that the sum function was renamed _sum and made > nominally private. > I have been thinking about this solution as well, but I think you really have to return a tuple of the sum as a Fraction and the type (not perhaps) since it would be really weird if the public functions in statistics always return a Fraction even if the input sequence consisted of only one standard type like int, float or Decimal. The obvious criticism then is that such a _sum is not really a sum function anymore like the existing ones. Then again, since this is a module private function it may be ok to do this? Best, Wolfgang [View Less]

1 0

We don't know each other's challenges (was Re: Iterative development)
by Nick Coghlan Feb. 1, 2014

Feb. 1, 2014

On 1 February 2014 03:42, Chris Angelico <rosuav(a)gmail.com> wrote: > I wouldn't withdraw my comment, because I still stand by it. If you > genuinely meant no specifics, then when someone pointed out how they > interpreted your statement, you would have apologized and made a > correction: "I didn't mean anyone in particular, I meant the way > there've been 50 issues reopened unnecessarily by 30 different people > lately", or something. But that wouldn't be true, would … [View More]it? You really > did mean Anatoly, and that's why you said what you did. Believe you > me, I know more than you think I do. Think of Emma from "Once Upon A > Time" if you like - a strong ability to detect lying, based on a > metric ton of experience with it. Chris, while Mark's behaviour has been out of line recently, that isn't anywhere near adequate justification for suggesting (even by implication) that another list participant is lying about their health status or their motives. It is impossible to diagnose *anyone* accurately over the internet - we can only give them the benefit of the doubt, take their word for it, and judge the outcome by whether they appear to be making genuine efforts to improve their behaviour, rather than assuming that everyone is starting from an identical baseline of expectations and capabilities in relation to civil discourse (especially once cultural variations are taken into account). Mark hasn't been trying to use his diagnosis as a get out of jail free card - he has been working with other members of the community on his coping strategies for dealing with mailing list discussions, and curbing his impulse to respond to poorly thought out ideas with unconstructive sarcasm. Now, I suggested to Mark that he consider asking the moderators to set his moderator flag for the time being, but he has instead chosen to step away from the core development lists entirely. While we *do* try to be inclusive of everyone, the thing that *will* get someone moderated, suspended and perhaps eventually banned entirely, is a consistent *pattern* of inappropriate behaviour, with no indication of genuine attempts to eliminate that behaviour (or even to understand why it is inappropriate). So if something seems out of line, *please* contact the list moderators (via python-ideas-owner(a)python.org), rather than retaliating directly on the list. If replying directly on the list, please try to assume temporary stress rather than persistent malice or obstinance on the part of the other poster in the absence of an extended history of interacting with them. Regards, Nick. -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia [View Less]

4 4