[SciPy-Dev] scipy.stats

Tue Jun 1 06:48:12 EDT 2010

On Tue, Jun 1, 2010 at 4:43 AM, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On Jun 1, 2010, at 3:12 AM, josef.pktd at gmail.com wrote:
>
> On Tue, Jun 1, 2010 at 12:54 AM, Travis Oliphant <oliphant at enthought.com>
> wrote:
>
> On May 31, 2010, at 9:16 AM, josef.pktd at gmail.com wrote:
>
> Since Travis seems to want to take back control of scipy.stats, I am
>
> considering my role as inofficial maintainer as ended.
>
> Obviously I've offended you.   That has never been my intent.   I apologize
> if my enthusiasm for getting some changes that I wanted to see into SciPy
> stepped on an area you felt ownership of.     I do not mind if people add
> changes to code that I've written and I assume that others feel the same.
> That has always been the development mode of SciPy.   We clearly have
> different development styles.    I think we can find a way to work together.
>   I think the move to github will help.
>
> I did not understand that you felt such ownership of scipy.stats.  I have
> certainly appreciated your input.
>
> I do like a more "free-wheeling" style to code development than one that is
> bogged down with "rules" and "procedures".     This clearly is not your
> style.   For me, it comes down to time to spend.   I love working on SciPy
> and NumPy.    I don't have a lot of time to do it.   When I see quick
> changes I can make that add value I like to be able to do it.   I think we
> both want the same thing while we may disagree about the best way to get
> there.
>
> In my mind, discussion doesn't end when a check-in is made --- it just
> begins.   You should never interpret my checking something in as the final
> word.   We clearly have a different view of "trunk"
>
> I certainly don't want my approach to open source development to offend
> others or chase them away.  If I check in something you don't like, then
> tell me and let's talk about it.    If you need to vent and call me names, a
> private email to me or others can go a long way.
>
> What do we need to do to keep you around?   Is there specifically something
> you didn't like about my recent check-ins?
>
> In this case, the features added were not terribly extensive.   The current
> unit tests helped ferret out major problems.  Yes, I could write more tests
> and documentation, and you have been a model of writing tests and
> documentation.   I have been particularly impressed by the amount of quality
> documentation you have written.
>
> While you seem to dismiss the episode as problematic, I actually think
> curve_fit was a good example of how something very positive can emerge
> quickly when people are open and willing to work together.
>
> While formal, strict test-driven development is easy to point to for
> salvation -- it does have its costs.   I've always used informal test-driven
> development.   Just because I don't *always* add formal unit tests for every
> piece of code written does not mean the code that is currently in SciPy is
> un-tested and useless.   Such an approach leaves me open to criticism, which
> I acknowledge.  But, I think there have been far too many dismissive
> comments about the state of the code.
>
> I would argue that the problem with scipy.stats does not lie mainly in
> distributions.py or the lack of test-driven-development --- but in the lack
> of certain easy to use features.    Quality code comes out of people who
> care --- not out of procedure.
>
> I think you are someone who cares and your code reflects that.    We would
> all benefit from your staying part of the main development.
>
> (not answering inline to keep thoughts together)
>
> I think the main disagreements are about the quality control of the
> trunk and whether scipy development is a community effort or not.
>
> I certainly think scipy development is a community effort.   I'm very sorry
> for making you feel "dumped" on.   That has never been my intent.  I was
> simply hoping to contribute a little where I could.

I only feel "dumped" on, because I want tested and verified stage. I
could leave it to somebody else in five years to clean it up. And I
don't want to add lot's of notes in docstrings, "use at your own risk,
this function hasn't been verified" as we sometimes do in our
(statsmodels) sandbox.

>
> As Skipper described, in statsmodels almost all development occurs in
> the sandbox and in branches, and it is only included in the "official"
> core of statsmodels after it has been verified and tests have been
> added. sandbox code is everything from first draft version to almost
> finished code.
> And one of Skippers task in his gsoc is to clean out the sandbox.
> Once it is in trunk (core) any further refactoring follows very strict
> rules.
>
> This has not been SciPy's process.   I can understand people may want it to
> become SciPy's process, but it has not been.  There are dangers of this
> process --- there is a reason that the mantra of "release early and release
> often".  It can also prevent progress when you are dealing with people's
> spare time because all of that process takes time and man-power and effort.
>   There is some value in it, I'm just not sure the extent of that value in
> contrast to other uses of that time.

I think that's another discussion I have seen already several times.

I think it's time that scipy moves to a "verified" only stage, instead
of "this is a young project, still work in progress and use at your
own risk"

> For example.  I would love to see statsmodels get more use.   I think there
> is much code there that is usable.  Yet, it remains outside of SciPy.
> If we agree to change the SciPy process will you agree to put statsmodels
> into SciPy?

I hope that statsmodels becomes too big for scipy, but I still would
like to see core models to go into scipy. To quote myself from the
pystatsmodels mailing list.

"The way it looks like, I don't think statsmodels (as a whole) will go
back into scipy, the count of python lines of code of statsmodels is
already almost 20% of the one in scipy according to ohloh.
Large parts of the code are still in the sandbox but with another gsoc
and continued development we will have too much statistics coverage for
statsmodels to be absorbed by scipy."

There are now at least 3 very active scikits, image, learn and
statsmodels, and I think the model of developing and maturing code in
a scikit starts to work pretty well.

For me it's easier to develop and mature inside a pure python package,
which is also more accessible for new contributors. One of my wishful
target audience are contributors on Windows, which would become rather
difficult as part of scipy and git.

>
> Specific to stats: I want a reference for any function where the
> explanation cannot be found with a Wikipedia search with one of the
> terms in the docstring. One or a few weeks ago, scipy.stats gained a
> new function, my asking on the mailing list what it is supposed to be,
> didn't receive any reply. (besides the problem that the function had
> the same name as an existing function).
>
> I did not see your message.   I changed the name of the function and didn't
> know you were concerned about the addition.   It is a convenience function
> for bayes_mvs that returns the distribution objects from which the other
> numbers can be obtained instead of just the numbers.     The paper is
> already referenced in bayes_mvs.

This explanation would have made a good comment in the notes section
of the docstring, and I wouldn't have to try to remember and look up
whether this might be some posterior distribution for a diffuse prior
with normal likelihood.

>
> Dumping new code into scipy trunk, without any review and tests,
> hoping that someone else looks for the problems is not an approach
> that I find acceptable.
>
> That was never my "hope".  I planned to and have fixed all problems that I
> saw later and that others have pointed out.   You can never test for all
> possible failures.

For many cases, I haven't seen you committed to do any maintenance on
it. At least, there are many functions that never got a test added
later on. You respond to bug reports, but that is after the fact, when
someone already ran into it.

What I think has to be required are basic tests. I'm not religious
about testing for all possible failures. Edge cases, numerical
precision problems, problems with initially not targeted use cases can
and need still be handled after the code is in trunk.

And as Skipper said, and I felt from the beginning about scipy, a
package where you cannot rely (up to a high) degree on the correctness
of the results is pretty unattractive for serious work. Nobody wants
to retract a paper because there was a programming mistake somewhere.

So, verification of the code for the main usecase(s) is the minimum
requirement that Skipper and I agreed upon last summer for any
statistics/econometrics in python development.

>
>
> Asking me if I have commit rights, shows at least some disconnect from
> the development of scipy in the last three years, since I have been
> pretty (too) noisy about it on the mailing lists.
>
> I know you have been noisy on the lists --- that's why I spoke to you about
> _logpdf and friends.  It also appears that you don't commit that often.

After, I had several crashes late last year and because I'm working
now mostly on statsmodels, I haven't kept my scipy development setup
up to date very often. I'm usually pretty fast in responding to open
issues and Stefan and Ralph made commits to scipy.stats that I
reviewed and were discussed on the mailing list or in a ticket.

On the other hand, I'm not "pushing" my own code into scipy very fast,
although I push it to the mailing list. Mainly, because I'm reluctant
to commit my own code when I don't think it's perfect yet, and when
the response on the mailing list doesn't look like there is an urgent
demand for it.
I only see feedback when the code gets questions later on on the
mailing list or on stackoverflow.
So this is maybe not the best approach.

> This is your process.   But, it made me wonder if permissions were an issue.
>    I was pretty sure you had been given commit rights, but I could not
> remember.   I'm sorry if that offended you.

I might have overreacted initially, but I would have expected you to
participate in the discussion or at least mention that you work on it,
instead of announcing it at almost (?) the same time as making the
commits.

Josef

> -Travis
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>