[pypy-dev] certificate for accepting numpypy new funcs?

Thu Jan 19 20:49:40 CET 2012

On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:
> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<dmitrey15 at ukr.net>  wrote:
>> Hi all,
>> could you provide clarification to numpypy new funcs accepting (not only for
>> me, but for any other possible volunteers)?
>> The doc I've been directed says only "You have to test exhaustively your
>> module", while I would like to know more explicit rules.
>> For example, "at least 3 tests per func" (however, I guess for funcs of
>> different complexity and variability number of tests also should expected to
>> be different).
>> Also, are there any strict rules for the testcases to be submitted, or I,
>> for example, can mere write
>>
>> if __name__ == '__main__':
>>     assert array_equal(1, 1)
>>     assert array_equal([1, 2], [1, 2])
>>     assert array_equal(N.array([1, 2]), N.array([1, 2]))
>>     assert array_equal([1, 2], N.array([1, 2]))
>>     assert array_equal([1, 2], [1, 2, 3]) is False
>>     print('passed')
> We have pretty exhaustive automated testing suites. Look for example
> in pypy/module/micronumpy/test directory for the test file style.
> They're run with py.test and we require at the very least full code
> coverage (every line has to be executed, there are tools to check,
> like coverage). Also passing "unusual" input, like sys.maxint  etc. is
> usually recommended. With your example, you would check if it works
> for say views and multidimensional arrays. Also "is False" is not
> considered good style.
>
>> Or there is a certain rule for storing files with tests?
>>
>> If I or someone else will submit a func with some tests like in the example
>> above, will you put the func and tests in the proper files by yourself? I'm
>> not lazy to go for it by myself, but I mere no merged enough into numpypy
>> dev process, including mercurial branches and numpypy files structure, and
>> can spend only quite limited time for diving into it in nearest future.
> We generally require people to put their own tests as they go with the
> code (in appropriate places) because you also should not break
> anything. The usefullness of a patch that has to be sliced and diced
> and put into places is very limited and for straightforward
> mostly-copied code, like array_equal, plain useless, since it's almost
> as much work to just do it.
Well, for this func (array_equal) my docstrings really were copied from 
cpython numpy (why wouln't do this to save some time, while license 
allows it?), but
* why would'n go for this (), while other programmers are busy by other 
tasks?
* engines of my and CPython numpy funcs complitely differs. At first, in 
PyPy the CPython code just doesn't work at all (because of the problem 
with ndarray.flat). At 2nd, I have implemented walkaround - just 
replaced some code lines by
     Size = a1.size
     f1, f2 = a1.flat, a2.flat
     # TODO: replace xrange by range in Python3
     for i in xrange(Size):
         if f1.next() != f2.next(): return False
     return True

Here are some results in CPython for the following bench:

from time import time
n = 100000
m = 100
a = N.zeros(n)
b = N.ones(n)
t = time()
for i in range(m):
     N.array_equal(a, b)
print('classic numpy array_equal time elapsed (on different arrays): 
%0.5f' % (time()-t))

t = time()
for i in range(m):
     array_equal(a, b)
print('Alternative array_equal time elapsed (on different arrays): 
%0.5f' % (time()-t))

b = N.zeros(n)

t = time()
for i in range(m):
     N.array_equal(a, b)
print('classic numpy array_equal time elapsed (on same arrays): %0.5f' % 
(time()-t))

t = time()
for i in range(m):
     array_equal(a, b)
print('Alternative array_equal time elapsed (on same arrays): %0.5f' % 
(time()-t))

CPython numpy results:
classic numpy array_equal time elapsed (on different arrays): 0.07728
Alternative array_equal time elapsed (on different arrays): 0.00056
classic numpy array_equal time elapsed (on same arrays): 0.11163
Alternative array_equal time elapsed (on same arrays): 9.09458

PyPy results (cannot test on "classic" version because it depends on 
some funcs that are unavailable yet):
Alternative array_equal time elapsed (on different arrays): 0.00133
Alternative array_equal time elapsed (on same arrays): 0.95038

So, as you see, even in CPython numpy my version is 138 times faster for 
different arrays (yet slower in 90 times for same arrays). However, in 
real world usually different arrays come to this func, and only 
sometimes similar arrays are encountered.
Well, for my implementation for case of equal arrays time elapsed 
essentially depends on their size, but in either way I still think my 
implementation is better than CPython, - it's faster and doesn't require 
allocation of memory for the boolean array, that will go to the logical_and.

I updated my array_equal implementation with the changes mentioned 
above, some tests on multidimensional arrays you've asked and put it in 
http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry 
with the link).

-----------------------
Regards, D.
http://openopt.org/Dmitrey