<br><br><div class="gmail_quote">On Thu, Jan 19, 2012 at 6:25 PM, Wes McKinney <span dir="ltr"><<a href="mailto:wesmckinn@gmail.com">wesmckinn@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On Thu, Jan 19, 2012 at 7:20 PM, Alex Gaynor <<a href="mailto:alex.gaynor@gmail.com">alex.gaynor@gmail.com</a>> wrote:<br>
><br>
><br>
> On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <<a href="mailto:wesmckinn@gmail.com">wesmckinn@gmail.com</a>> wrote:<br>
>><br>
>> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <<a href="mailto:dmitrey15@ukr.net">dmitrey15@ukr.net</a>> wrote:<br>
>> > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:<br>
>> >><br>
>> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<<a href="mailto:dmitrey15@ukr.net">dmitrey15@ukr.net</a>> wrote:<br>
>> >>><br>
>> >>> Hi all,<br>
>> >>> could you provide clarification to numpypy new funcs accepting (not<br>
>> >>> only<br>
>> >>> for<br>
>> >>> me, but for any other possible volunteers)?<br>
>> >>> The doc I've been directed says only "You have to test exhaustively<br>
>> >>> your<br>
>> >>> module", while I would like to know more explicit rules.<br>
>> >>> For example, "at least 3 tests per func" (however, I guess for funcs<br>
>> >>> of<br>
>> >>> different complexity and variability number of tests also should<br>
>> >>> expected<br>
>> >>> to<br>
>> >>> be different).<br>
>> >>> Also, are there any strict rules for the testcases to be submitted, or<br>
>> >>> I,<br>
>> >>> for example, can mere write<br>
>> >>><br>
>> >>> if __name__ == '__main__':<br>
>> >>> assert array_equal(1, 1)<br>
>> >>> assert array_equal([1, 2], [1, 2])<br>
>> >>> assert array_equal(N.array([1, 2]), N.array([1, 2]))<br>
>> >>> assert array_equal([1, 2], N.array([1, 2]))<br>
>> >>> assert array_equal([1, 2], [1, 2, 3]) is False<br>
>> >>> print('passed')<br>
>> >><br>
>> >> We have pretty exhaustive automated testing suites. Look for example<br>
>> >> in pypy/module/micronumpy/test directory for the test file style.<br>
>> >> They're run with py.test and we require at the very least full code<br>
>> >> coverage (every line has to be executed, there are tools to check,<br>
>> >> like coverage). Also passing "unusual" input, like sys.maxint etc. is<br>
>> >> usually recommended. With your example, you would check if it works<br>
>> >> for say views and multidimensional arrays. Also "is False" is not<br>
>> >> considered good style.<br>
>> >><br>
>> >>> Or there is a certain rule for storing files with tests?<br>
>> >>><br>
>> >>> If I or someone else will submit a func with some tests like in the<br>
>> >>> example<br>
>> >>> above, will you put the func and tests in the proper files by<br>
>> >>> yourself?<br>
>> >>> I'm<br>
>> >>> not lazy to go for it by myself, but I mere no merged enough into<br>
>> >>> numpypy<br>
>> >>> dev process, including mercurial branches and numpypy files structure,<br>
>> >>> and<br>
>> >>> can spend only quite limited time for diving into it in nearest<br>
>> >>> future.<br>
>> >><br>
>> >> We generally require people to put their own tests as they go with the<br>
>> >> code (in appropriate places) because you also should not break<br>
>> >> anything. The usefullness of a patch that has to be sliced and diced<br>
>> >> and put into places is very limited and for straightforward<br>
>> >> mostly-copied code, like array_equal, plain useless, since it's almost<br>
>> >> as much work to just do it.<br>
>> ><br>
>> > Well, for this func (array_equal) my docstrings really were copied from<br>
>> > cpython numpy (why wouln't do this to save some time, while license<br>
>> > allows<br>
>> > it?), but<br>
>> > * why would'n go for this (), while other programmers are busy by other<br>
>> > tasks?<br>
>> > * engines of my and CPython numpy funcs complitely differs. At first, in<br>
>> > PyPy the CPython code just doesn't work at all (because of the problem<br>
>> > with<br>
>> > ndarray.flat). At 2nd, I have implemented walkaround - just replaced<br>
>> > some<br>
>> > code lines by<br>
>> > Size = a1.size<br>
>> > f1, f2 = a1.flat, a2.flat<br>
>> > # TODO: replace xrange by range in Python3<br>
>> > for i in xrange(Size):<br>
>> > if f1.next() != f2.next(): return False<br>
>> > return True<br>
>> ><br>
>> > Here are some results in CPython for the following bench:<br>
>> ><br>
>> > from time import time<br>
>> > n = 100000<br>
>> > m = 100<br>
>> > a = N.zeros(n)<br>
>> > b = N.ones(n)<br>
>> > t = time()<br>
>> > for i in range(m):<br>
>> > N.array_equal(a, b)<br>
>> > print('classic numpy array_equal time elapsed (on different arrays):<br>
>> > %0.5f'<br>
>> > % (time()-t))<br>
>> ><br>
>> ><br>
>> > t = time()<br>
>> > for i in range(m):<br>
>> > array_equal(a, b)<br>
>> > print('Alternative array_equal time elapsed (on different arrays):<br>
>> > %0.5f' %<br>
>> > (time()-t))<br>
>> ><br>
>> > b = N.zeros(n)<br>
>> ><br>
>> > t = time()<br>
>> > for i in range(m):<br>
>> > N.array_equal(a, b)<br>
>> > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' %<br>
>> > (time()-t))<br>
>> ><br>
>> > t = time()<br>
>> > for i in range(m):<br>
>> > array_equal(a, b)<br>
>> > print('Alternative array_equal time elapsed (on same arrays): %0.5f' %<br>
>> > (time()-t))<br>
>> ><br>
>> > CPython numpy results:<br>
>> > classic numpy array_equal time elapsed (on different arrays): 0.07728<br>
>> > Alternative array_equal time elapsed (on different arrays): 0.00056<br>
>> > classic numpy array_equal time elapsed (on same arrays): 0.11163<br>
>> > Alternative array_equal time elapsed (on same arrays): 9.09458<br>
>> ><br>
>> > PyPy results (cannot test on "classic" version because it depends on<br>
>> > some<br>
>> > funcs that are unavailable yet):<br>
>> > Alternative array_equal time elapsed (on different arrays): 0.00133<br>
>> > Alternative array_equal time elapsed (on same arrays): 0.95038<br>
>> ><br>
>> ><br>
>> > So, as you see, even in CPython numpy my version is 138 times faster for<br>
>> > different arrays (yet slower in 90 times for same arrays). However, in<br>
>> > real<br>
>> > world usually different arrays come to this func, and only sometimes<br>
>> > similar<br>
>> > arrays are encountered.<br>
>> > Well, for my implementation for case of equal arrays time elapsed<br>
>> > essentially depends on their size, but in either way I still think my<br>
>> > implementation is better than CPython, - it's faster and doesn't require<br>
>> > allocation of memory for the boolean array, that will go to the<br>
>> > logical_and.<br>
>> ><br>
>> > I updated my array_equal implementation with the changes mentioned<br>
>> > above,<br>
>> > some tests on multidimensional arrays you've asked and put it in<br>
>> > <a href="http://pastebin.com/tg2aHE6x" target="_blank">http://pastebin.com/tg2aHE6x</a> (now I'll update the <a href="http://bugs.pypy.org" target="_blank">bugs.pypy.org</a> entry<br>
>> > with<br>
>> > the link).<br>
>> ><br>
>> ><br>
>> > -----------------------<br>
>> > Regards, D.<br>
>> > <a href="http://openopt.org/Dmitrey" target="_blank">http://openopt.org/Dmitrey</a><br>
>> > _______________________________________________<br>
>> > pypy-dev mailing list<br>
>> > <a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>
>> > <a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>
>><br>
>> Worth pointing out that the implementation of array_equal and<br>
>> array_equiv in NumPy are a bit embarrassing because they require a<br>
>> full N comparisons instead of short-circuiting whenever a False value<br>
>> is found. This is completely silly IMHO:<br>
>><br>
>> In [34]: x = np.random.randn(100000)<br>
>><br>
>> In [35]: y = np.random.randn(100000)<br>
>><br>
>> In [36]: timeit np.array_equal(x, y)<br>
>> 1000 loops, best of 3: 349 us per loop<br>
>><br>
>> - W<br>
>> _______________________________________________<br>
>> pypy-dev mailing list<br>
>> <a href="mailto:pypy-dev@python.org">pypy-dev@python.org</a><br>
>> <a href="http://mail.python.org/mailman/listinfo/pypy-dev" target="_blank">http://mail.python.org/mailman/listinfo/pypy-dev</a><br>
><br>
><br>
> The correct solution (IMO), is to reuse the original NumPy implementation,<br>
> but have logical_and.reduce short circuit correctly. This has the nice side<br>
> effect of allowing all() and any() to use logical_and/logical_or.reduce.<br>
><br>
> Alx<br>
<br>
</div></div>To do that, you're going to have to work around the eagerness of<br>
Python-- it sort of makes me cringe to see you guys copying<br>
eager-beaver NumPy when you have a wonderful opportunity to do<br>
something better. Imagine if NumPy and APL/J/K had a lazy functional<br>
lovechild implemented in PyPy. Though maybe you're already 10 steps<br>
ahead of me.<br></blockquote><div><br></div><div>Well, you're the first person to ever express the sentiment that we should do something else :) But I think you'll be pleased, read on! </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Hopefully you could make the JIT automatically take a simple array<br>
expression like this:<br>
<br>
bool(logical_and.reduce(equal(a1,a2).ravel()))<br>
<br>
and examine the array expression and turn it into an ultra fast<br>
functional expression that short circuits immediately:<br>
<br>
for x, y in zip(a1, a2):<br>
if x != y:<br>
return False<br>
return True<br>
<br>
To do that you would need to make all your ufuncs return generators<br>
instead of ndarrays. With the JIT infrastructure you could probably<br>
make this work. If ever ufunc yields a generator you could build<br>
functional array pipelines (now I'm talking like Peter Wang). But if<br>
you insist on replicating C NumPy, well...<br>
<div class="HOEnZb"><div class="h5"><br>
W<br>
<br>
><br>
> --<br>
> "I disapprove of what you say, but I will defend to the death your right to<br>
> say it." -- Evelyn Beatrice Hall (summarizing Voltaire)<br>
> "The people's good is the highest law." -- Cicero<br>
><br>
</div></div></blockquote></div><br>These don't need to return generators, they just need to return things that look like ndarrays, but are internally lazy. And that's exactly what we do.<div><br></div><div>Using .all() instead of logical_and.reduce() (since we don't have logical_and yet, and even if we did it wouldn't short circuit without some extra work), the JIT will generate almost exactly the code you posted (except in x86 :P).</div>
<div><br></div><div>Alex<br clear="all"><div><br></div>-- <br>"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)<br>"The people's good is the highest law." -- Cicero<br>
<br>
</div>