[pypy-dev] certificate for accepting numpypy new funcs?

Alex Gaynor alex.gaynor at gmail.com
Fri Jan 20 01:20:54 CET 2012


On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <wesmckinn at gmail.com> wrote:

> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <dmitrey15 at ukr.net> wrote:
> > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:
> >>
> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<dmitrey15 at ukr.net>  wrote:
> >>>
> >>> Hi all,
> >>> could you provide clarification to numpypy new funcs accepting (not
> only
> >>> for
> >>> me, but for any other possible volunteers)?
> >>> The doc I've been directed says only "You have to test exhaustively
> your
> >>> module", while I would like to know more explicit rules.
> >>> For example, "at least 3 tests per func" (however, I guess for funcs of
> >>> different complexity and variability number of tests also should
> expected
> >>> to
> >>> be different).
> >>> Also, are there any strict rules for the testcases to be submitted, or
> I,
> >>> for example, can mere write
> >>>
> >>> if __name__ == '__main__':
> >>>    assert array_equal(1, 1)
> >>>    assert array_equal([1, 2], [1, 2])
> >>>    assert array_equal(N.array([1, 2]), N.array([1, 2]))
> >>>    assert array_equal([1, 2], N.array([1, 2]))
> >>>    assert array_equal([1, 2], [1, 2, 3]) is False
> >>>    print('passed')
> >>
> >> We have pretty exhaustive automated testing suites. Look for example
> >> in pypy/module/micronumpy/test directory for the test file style.
> >> They're run with py.test and we require at the very least full code
> >> coverage (every line has to be executed, there are tools to check,
> >> like coverage). Also passing "unusual" input, like sys.maxint  etc. is
> >> usually recommended. With your example, you would check if it works
> >> for say views and multidimensional arrays. Also "is False" is not
> >> considered good style.
> >>
> >>> Or there is a certain rule for storing files with tests?
> >>>
> >>> If I or someone else will submit a func with some tests like in the
> >>> example
> >>> above, will you put the func and tests in the proper files by yourself?
> >>> I'm
> >>> not lazy to go for it by myself, but I mere no merged enough into
> numpypy
> >>> dev process, including mercurial branches and numpypy files structure,
> >>> and
> >>> can spend only quite limited time for diving into it in nearest future.
> >>
> >> We generally require people to put their own tests as they go with the
> >> code (in appropriate places) because you also should not break
> >> anything. The usefullness of a patch that has to be sliced and diced
> >> and put into places is very limited and for straightforward
> >> mostly-copied code, like array_equal, plain useless, since it's almost
> >> as much work to just do it.
> >
> > Well, for this func (array_equal) my docstrings really were copied from
> > cpython numpy (why wouln't do this to save some time, while license
> allows
> > it?), but
> > * why would'n go for this (), while other programmers are busy by other
> > tasks?
> > * engines of my and CPython numpy funcs complitely differs. At first, in
> > PyPy the CPython code just doesn't work at all (because of the problem
> with
> > ndarray.flat). At 2nd, I have implemented walkaround - just replaced some
> > code lines by
> >    Size = a1.size
> >    f1, f2 = a1.flat, a2.flat
> >    # TODO: replace xrange by range in Python3
> >    for i in xrange(Size):
> >        if f1.next() != f2.next(): return False
> >    return True
> >
> > Here are some results in CPython for the following bench:
> >
> > from time import time
> > n = 100000
> > m = 100
> > a = N.zeros(n)
> > b = N.ones(n)
> > t = time()
> > for i in range(m):
> >    N.array_equal(a, b)
> > print('classic numpy array_equal time elapsed (on different arrays):
> %0.5f'
> > % (time()-t))
> >
> >
> > t = time()
> > for i in range(m):
> >    array_equal(a, b)
> > print('Alternative array_equal time elapsed (on different arrays):
> %0.5f' %
> > (time()-t))
> >
> > b = N.zeros(n)
> >
> > t = time()
> > for i in range(m):
> >    N.array_equal(a, b)
> > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' %
> > (time()-t))
> >
> > t = time()
> > for i in range(m):
> >    array_equal(a, b)
> > print('Alternative array_equal time elapsed (on same arrays): %0.5f' %
> > (time()-t))
> >
> > CPython numpy results:
> > classic numpy array_equal time elapsed (on different arrays): 0.07728
> > Alternative array_equal time elapsed (on different arrays): 0.00056
> > classic numpy array_equal time elapsed (on same arrays): 0.11163
> > Alternative array_equal time elapsed (on same arrays): 9.09458
> >
> > PyPy results (cannot test on "classic" version because it depends on some
> > funcs that are unavailable yet):
> > Alternative array_equal time elapsed (on different arrays): 0.00133
> > Alternative array_equal time elapsed (on same arrays): 0.95038
> >
> >
> > So, as you see, even in CPython numpy my version is 138 times faster for
> > different arrays (yet slower in 90 times for same arrays). However, in
> real
> > world usually different arrays come to this func, and only sometimes
> similar
> > arrays are encountered.
> > Well, for my implementation for case of equal arrays time elapsed
> > essentially depends on their size, but in either way I still think my
> > implementation is better than CPython, - it's faster and doesn't require
> > allocation of memory for the boolean array, that will go to the
> logical_and.
> >
> > I updated my array_equal implementation with the changes mentioned above,
> > some tests on multidimensional arrays you've asked and put it in
> > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry
> with
> > the link).
> >
> >
> > -----------------------
> > Regards, D.
> > http://openopt.org/Dmitrey
> > _______________________________________________
> > pypy-dev mailing list
> > pypy-dev at python.org
> > http://mail.python.org/mailman/listinfo/pypy-dev
>
> Worth pointing out that the implementation of array_equal and
> array_equiv in NumPy are a bit embarrassing because they require a
> full N comparisons instead of short-circuiting whenever a False value
> is found. This is completely silly IMHO:
>
> In [34]: x = np.random.randn(100000)
>
> In [35]: y = np.random.randn(100000)
>
> In [36]: timeit np.array_equal(x, y)
> 1000 loops, best of 3: 349 us per loop
>
> - W
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> http://mail.python.org/mailman/listinfo/pypy-dev
>

The correct solution (IMO), is to reuse the original NumPy implementation,
but have logical_and.reduce short circuit correctly.  This has the nice
side effect of allowing all() and any() to use
logical_and/logical_or.reduce.

Alx

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20120119/ceff6bfa/attachment.html>


More information about the pypy-dev mailing list