<br><br><div class="gmail_quote">On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <span dir="ltr"><<a href="mailto:wesmckinn@gmail.com">wesmckinn@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <<a href="mailto:dmitrey15@ukr.net">dmitrey15@ukr.net</a>> wrote:<br>
> On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:<br>
>><br>
>> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<<a href="mailto:dmitrey15@ukr.net">dmitrey15@ukr.net</a>> wrote:<br>
>>><br>
>>> Hi all,<br>
>>> could you provide clarification to numpypy new funcs accepting (not only<br>
>>> for<br>
>>> me, but for any other possible volunteers)?<br>
>>> The doc I've been directed says only "You have to test exhaustively your<br>
>>> module", while I would like to know more explicit rules.<br>
>>> For example, "at least 3 tests per func" (however, I guess for funcs of<br>
>>> different complexity and variability number of tests also should expected<br>
>>> to<br>
>>> be different).<br>
>>> Also, are there any strict rules for the testcases to be submitted, or I,<br>
>>> for example, can mere write<br>
>>><br>
>>> if __name__ == '__main__':<br>
>>> assert array_equal(1, 1)<br>
>>> assert array_equal([1, 2], [1, 2])<br>
>>> assert array_equal(N.array([1, 2]), N.array([1, 2]))<br>
>>> assert array_equal([1, 2], N.array([1, 2]))<br>
>>> assert array_equal([1, 2], [1, 2, 3]) is False<br>
>>> print('passed')<br>
>><br>
>> We have pretty exhaustive automated testing suites. Look for example<br>
>> in pypy/module/micronumpy/test directory for the test file style.<br>
>> They're run with py.test and we require at the very least full code<br>
>> coverage (every line has to be executed, there are tools to check,<br>
>> like coverage). Also passing "unusual" input, like sys.maxint etc. is<br>
>> usually recommended. With your example, you would check if it works<br>
>> for say views and multidimensional arrays. Also "is False" is not<br>
>> considered good style.<br>
>><br>
>>> Or there is a certain rule for storing files with tests?<br>
>>><br>
>>> If I or someone else will submit a func with some tests like in the<br>
>>> example<br>
>>> above, will you put the func and tests in the proper files by yourself?<br>
>>> I'm<br>
>>> not lazy to go for it by myself, but I mere no merged enough into numpypy<br>
>>> dev process, including mercurial branches and numpypy files structure,<br>
>>> and<br>
>>> can spend only quite limited time for diving into it in nearest future.<br>
>><br>
>> We generally require people to put their own tests as they go with the<br>
>> code (in appropriate places) because you also should not break<br>
>> anything. The usefullness of a patch that has to be sliced and diced<br>
>> and put into places is very limited and for straightforward<br>
>> mostly-copied code, like array_equal, plain useless, since it's almost<br>
>> as much work to just do it.<br>
><br>
> Well, for this func (array_equal) my docstrings really were copied from<br>
> cpython numpy (why wouln't do this to save some time, while license allows<br>
> it?), but<br>
> * why would'n go for this (), while other programmers are busy by other<br>
> tasks?<br>
> * engines of my and CPython numpy funcs complitely differs. At first, in<br>
> PyPy the CPython code just doesn't work at all (because of the problem with<br>
> ndarray.flat). At 2nd, I have implemented walkaround - just replaced some<br>
> code lines by<br>
> Size = a1.size<br>
> f1, f2 = a1.flat, a2.flat<br>
> # TODO: replace xrange by range in Python3<br>
> for i in xrange(Size):<br>
> if f1.next() != f2.next(): return False<br>
> return True<br>
><br>
> Here are some results in CPython for the following bench:<br>
><br>
> from time import time<br>
> n = 100000<br>
> m = 100<br>
> a = N.zeros(n)<br>
> b = N.ones(n)<br>
> t = time()<br>
> for i in range(m):<br>
> N.array_equal(a, b)<br>
> print('classic numpy array_equal time elapsed (on different arrays): %0.5f'<br>
> % (time()-t))<br>
><br>
><br>
> t = time()<br>
> for i in range(m):<br>
> array_equal(a, b)<br>
> print('Alternative array_equal time elapsed (on different arrays): %0.5f' %<br>
> (time()-t))<br>
><br>
> b = N.zeros(n)<br>
><br>
> t = time()<br>
> for i in range(m):<br>
> N.array_equal(a, b)<br>
> print('classic numpy array_equal time elapsed (on same arrays): %0.5f' %<br>
> (time()-t))<br>
><br>
> t = time()<br>
> for i in range(m):<br>
> array_equal(a, b)<br>
> print('Alternative array_equal time elapsed (on same arrays): %0.5f' %<br>
> (time()-t))<br>
><br>
> CPython numpy results:<br>
> classic numpy array_equal time elapsed (on different arrays): 0.07728<br>
> Alternative array_equal time elapsed (on different arrays): 0.00056<br>
> classic numpy array_equal time elapsed (on same arrays): 0.11163<br>
> Alternative array_equal time elapsed (on same arrays): 9.09458<br>
><br>
> PyPy results (cannot test on "classic" version because it depends on some<br>
> funcs that are unavailable yet):<br>
> Alternative array_equal time elapsed (on different arrays): 0.00133<br>
> Alternative array_equal time elapsed (on same arrays): 0.95038<br>
><br>
><br>
> So, as you see, even in CPython numpy my version is 138 times faster for<br>
> different arrays (yet slower in 90 times for same arrays). However, in real<br>
> world usually different arrays come to this func, and only sometimes similar<br>
> arrays are encountered.<br>
> Well, for my implementation for case of equal arrays time elapsed<br>
> essentially depends on their size, but in either way I still think my<br>
> implementation is better than CPython, - it's faster and doesn't require<br>
> allocation of memory for the boolean array, that will go to the logical_and.<br>
><br>
> I updated my array_equal implementation with the changes mentioned above,<br>
> some tests on multidimensional arrays you've asked and put it in<br>
> <a href="http://pastebin.com/tg2aHE6x" target="_blank">http://pastebin.com/tg2aHE6x</a> (now I'll update the <a href="http://bugs.pypy.org" target="_blank">bugs.pypy.org</a> entry with<br>
> the link).<br>
><br>
><br>
> -----------------------<br>
> Regards, D.<br>
> <a href="http://openopt.org/Dmitrey" target="_blank">http://openopt.org/Dmitrey</a><br>
> _______________________________________________<br>
> pypy-dev mailing list<br>
> <a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>
> <a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>
<br>
</div></div>Worth pointing out that the implementation of array_equal and<br>
array_equiv in NumPy are a bit embarrassing because they require a<br>
full N comparisons instead of short-circuiting whenever a False value<br>
is found. This is completely silly IMHO:<br>
<br>
In [34]: x = np.random.randn(100000)<br>
<br>
In [35]: y = np.random.randn(100000)<br>
<br>
In [36]: timeit np.array_equal(x, y)<br>
1000 loops, best of 3: 349 us per loop<br>
<span class="HOEnZb"><font color="#888888"><br>
- W<br>
</font></span><div class="HOEnZb"><div class="h5">_______________________________________________<br>
pypy-dev mailing list<br>
<a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>
<a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>
</div></div></blockquote></div><br>The correct solution (IMO), is to reuse the original NumPy implementation, but have logical_and.reduce short circuit correctly. This has the nice side effect of allowing all() and any() to use logical_and/logical_or.reduce.<div>
<br></div><div>Alx<br clear="all"><div><br></div>-- <br>"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)<br>"The people's good is the highest law." -- Cicero<br>
<br>
</div>