[pypy-dev] certificate for accepting numpypy new funcs?

Fri Jan 20 14:18:44 CET 2012

On Fri, Jan 20, 2012 at 2:47 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Alex Gaynor wrote:
>
>> On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>
>>> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <dmitrey15 at ukr.net> wrote:
>>> > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:
>>> >>
>>> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<dmitrey15 at ukr.net>  wrote:
>>> >>>
>>> >>> Hi all,
>>> >>> could you provide clarification to numpypy new funcs accepting (not
>>> only
>>> >>> for
>>> >>> me, but for any other possible volunteers)?
>>> >>> The doc I've been directed says only "You have to test exhaustively
>>> your
>>> >>> module", while I would like to know more explicit rules.
>>> >>> For example, "at least 3 tests per func" (however, I guess for funcs of
>>> >>> different complexity and variability number of tests also should
>>> expected
>>> >>> to
>>> >>> be different).
>>> >>> Also, are there any strict rules for the testcases to be submitted, or
>>> I,
>>> >>> for example, can mere write
>>> >>>
>>> >>> if __name__ == '__main__':
>>> >>>    assert array_equal(1, 1)
>>> >>>    assert array_equal([1, 2], [1, 2])
>>> >>>    assert array_equal(N.array([1, 2]), N.array([1, 2]))
>>> >>>    assert array_equal([1, 2], N.array([1, 2]))
>>> >>>    assert array_equal([1, 2], [1, 2, 3]) is False
>>> >>>    print('passed')
>>> >>
>>> >> We have pretty exhaustive automated testing suites. Look for example
>>> >> in pypy/module/micronumpy/test directory for the test file style.
>>> >> They're run with py.test and we require at the very least full code
>>> >> coverage (every line has to be executed, there are tools to check,
>>> >> like coverage). Also passing "unusual" input, like sys.maxint  etc. is
>>> >> usually recommended. With your example, you would check if it works
>>> >> for say views and multidimensional arrays. Also "is False" is not
>>> >> considered good style.
>>> >>
>>> >>> Or there is a certain rule for storing files with tests?
>>> >>>
>>> >>> If I or someone else will submit a func with some tests like in the
>>> >>> example
>>> >>> above, will you put the func and tests in the proper files by yourself?
>>> >>> I'm
>>> >>> not lazy to go for it by myself, but I mere no merged enough into
>>> numpypy
>>> >>> dev process, including mercurial branches and numpypy files structure,
>>> >>> and
>>> >>> can spend only quite limited time for diving into it in nearest future.
>>> >>
>>> >> We generally require people to put their own tests as they go with the
>>> >> code (in appropriate places) because you also should not break
>>> >> anything. The usefullness of a patch that has to be sliced and diced
>>> >> and put into places is very limited and for straightforward
>>> >> mostly-copied code, like array_equal, plain useless, since it's almost
>>> >> as much work to just do it.
>>> >
>>> > Well, for this func (array_equal) my docstrings really were copied from
>>> > cpython numpy (why wouln't do this to save some time, while license
>>> allows
>>> > it?), but
>>> > * why would'n go for this (), while other programmers are busy by other
>>> > tasks?
>>> > * engines of my and CPython numpy funcs complitely differs. At first, in
>>> > PyPy the CPython code just doesn't work at all (because of the problem
>>> with
>>> > ndarray.flat). At 2nd, I have implemented walkaround - just replaced some
>>> > code lines by
>>> >    Size = a1.size
>>> >    f1, f2 = a1.flat, a2.flat
>>> >    # TODO: replace xrange by range in Python3
>>> >    for i in xrange(Size):
>>> >        if f1.next() != f2.next(): return False
>>> >    return True
>>> >
>>> > Here are some results in CPython for the following bench:
>>> >
>>> > from time import time
>>> > n = 100000
>>> > m = 100
>>> > a = N.zeros(n)
>>> > b = N.ones(n)
>>> > t = time()
>>> > for i in range(m):
>>> >    N.array_equal(a, b)
>>> > print('classic numpy array_equal time elapsed (on different arrays):
>>> %0.5f'
>>> > % (time()-t))
>>> >
>>> >
>>> > t = time()
>>> > for i in range(m):
>>> >    array_equal(a, b)
>>> > print('Alternative array_equal time elapsed (on different arrays):
>>> %0.5f' %
>>> > (time()-t))
>>> >
>>> > b = N.zeros(n)
>>> >
>>> > t = time()
>>> > for i in range(m):
>>> >    N.array_equal(a, b)
>>> > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' %
>>> > (time()-t))
>>> >
>>> > t = time()
>>> > for i in range(m):
>>> >    array_equal(a, b)
>>> > print('Alternative array_equal time elapsed (on same arrays): %0.5f' %
>>> > (time()-t))
>>> >
>>> > CPython numpy results:
>>> > classic numpy array_equal time elapsed (on different arrays): 0.07728
>>> > Alternative array_equal time elapsed (on different arrays): 0.00056
>>> > classic numpy array_equal time elapsed (on same arrays): 0.11163
>>> > Alternative array_equal time elapsed (on same arrays): 9.09458
>>> >
>>> > PyPy results (cannot test on "classic" version because it depends on some
>>> > funcs that are unavailable yet):
>>> > Alternative array_equal time elapsed (on different arrays): 0.00133
>>> > Alternative array_equal time elapsed (on same arrays): 0.95038
>>> >
>>> >
>>> > So, as you see, even in CPython numpy my version is 138 times faster for
>>> > different arrays (yet slower in 90 times for same arrays). However, in
>>> real
>>> > world usually different arrays come to this func, and only sometimes
>>> similar
>>> > arrays are encountered.
>>> > Well, for my implementation for case of equal arrays time elapsed
>>> > essentially depends on their size, but in either way I still think my
>>> > implementation is better than CPython, - it's faster and doesn't require
>>> > allocation of memory for the boolean array, that will go to the
>>> logical_and.
>>> >
>>> > I updated my array_equal implementation with the changes mentioned above,
>>> > some tests on multidimensional arrays you've asked and put it in
>>> > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry
>>> with
>>> > the link).
>>> >
>>> >
>>> > -----------------------
>>> > Regards, D.
>>> > http://openopt.org/Dmitrey
>>> > _______________________________________________
>>> > pypy-dev mailing list
>>> > pypy-dev at python.org
>>> > http://mail.python.org/mailman/listinfo/pypy-dev
>>>
>>> Worth pointing out that the implementation of array_equal and
>>> array_equiv in NumPy are a bit embarrassing because they require a
>>> full N comparisons instead of short-circuiting whenever a False value
>>> is found. This is completely silly IMHO:
>>>
>>> In [34]: x = np.random.randn(100000)
>>>
>>> In [35]: y = np.random.randn(100000)
>>>
>>> In [36]: timeit np.array_equal(x, y)
>>> 1000 loops, best of 3: 349 us per loop
>>>
>>> - W
>>> _______________________________________________
>>> pypy-dev mailing list
>>> pypy-dev at python.org
>>> http://mail.python.org/mailman/listinfo/pypy-dev
>>>
>>
>> The correct solution (IMO), is to reuse the original NumPy implementation,
>> but have logical_and.reduce short circuit correctly.  This has the nice
>> side effect of allowing all() and any() to use
>> logical_and/logical_or.reduce.
>>
>> Alx
>>
>
>
> I have complained on the numpy list about 1 year ago about this issue.  The
> usual numpy idiom is
>
> np.any (some comparison)
>
> which will create an array of the full size comparing each element before
> attempting the 'any', which is obviously wasteful.  Hope numpypy can do better.

It does better already FYI. It does not completely work with all kinds
of possible constructs (like .flat) but works in general.