[pypy-dev] certificate for accepting numpypy new funcs?
Maciej Fijalkowski
fijall at gmail.com
Fri Jan 20 09:36:01 CET 2012
On Fri, Jan 20, 2012 at 2:31 AM, Alex Gaynor <alex.gaynor at gmail.com> wrote:
>
>
> On Thu, Jan 19, 2012 at 6:25 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>
>> On Thu, Jan 19, 2012 at 7:20 PM, Alex Gaynor <alex.gaynor at gmail.com>
>> wrote:
>> >
>> >
>> > On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <wesmckinn at gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <dmitrey15 at ukr.net> wrote:
>> >> > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:
>> >> >>
>> >> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<dmitrey15 at ukr.net> wrote:
>> >> >>>
>> >> >>> Hi all,
>> >> >>> could you provide clarification to numpypy new funcs accepting (not
>> >> >>> only
>> >> >>> for
>> >> >>> me, but for any other possible volunteers)?
>> >> >>> The doc I've been directed says only "You have to test exhaustively
>> >> >>> your
>> >> >>> module", while I would like to know more explicit rules.
>> >> >>> For example, "at least 3 tests per func" (however, I guess for
>> >> >>> funcs
>> >> >>> of
>> >> >>> different complexity and variability number of tests also should
>> >> >>> expected
>> >> >>> to
>> >> >>> be different).
>> >> >>> Also, are there any strict rules for the testcases to be submitted,
>> >> >>> or
>> >> >>> I,
>> >> >>> for example, can mere write
>> >> >>>
>> >> >>> if __name__ == '__main__':
>> >> >>> assert array_equal(1, 1)
>> >> >>> assert array_equal([1, 2], [1, 2])
>> >> >>> assert array_equal(N.array([1, 2]), N.array([1, 2]))
>> >> >>> assert array_equal([1, 2], N.array([1, 2]))
>> >> >>> assert array_equal([1, 2], [1, 2, 3]) is False
>> >> >>> print('passed')
>> >> >>
>> >> >> We have pretty exhaustive automated testing suites. Look for example
>> >> >> in pypy/module/micronumpy/test directory for the test file style.
>> >> >> They're run with py.test and we require at the very least full code
>> >> >> coverage (every line has to be executed, there are tools to check,
>> >> >> like coverage). Also passing "unusual" input, like sys.maxint etc.
>> >> >> is
>> >> >> usually recommended. With your example, you would check if it works
>> >> >> for say views and multidimensional arrays. Also "is False" is not
>> >> >> considered good style.
>> >> >>
>> >> >>> Or there is a certain rule for storing files with tests?
>> >> >>>
>> >> >>> If I or someone else will submit a func with some tests like in the
>> >> >>> example
>> >> >>> above, will you put the func and tests in the proper files by
>> >> >>> yourself?
>> >> >>> I'm
>> >> >>> not lazy to go for it by myself, but I mere no merged enough into
>> >> >>> numpypy
>> >> >>> dev process, including mercurial branches and numpypy files
>> >> >>> structure,
>> >> >>> and
>> >> >>> can spend only quite limited time for diving into it in nearest
>> >> >>> future.
>> >> >>
>> >> >> We generally require people to put their own tests as they go with
>> >> >> the
>> >> >> code (in appropriate places) because you also should not break
>> >> >> anything. The usefullness of a patch that has to be sliced and diced
>> >> >> and put into places is very limited and for straightforward
>> >> >> mostly-copied code, like array_equal, plain useless, since it's
>> >> >> almost
>> >> >> as much work to just do it.
>> >> >
>> >> > Well, for this func (array_equal) my docstrings really were copied
>> >> > from
>> >> > cpython numpy (why wouln't do this to save some time, while license
>> >> > allows
>> >> > it?), but
>> >> > * why would'n go for this (), while other programmers are busy by
>> >> > other
>> >> > tasks?
>> >> > * engines of my and CPython numpy funcs complitely differs. At first,
>> >> > in
>> >> > PyPy the CPython code just doesn't work at all (because of the
>> >> > problem
>> >> > with
>> >> > ndarray.flat). At 2nd, I have implemented walkaround - just replaced
>> >> > some
>> >> > code lines by
>> >> > Size = a1.size
>> >> > f1, f2 = a1.flat, a2.flat
>> >> > # TODO: replace xrange by range in Python3
>> >> > for i in xrange(Size):
>> >> > if f1.next() != f2.next(): return False
>> >> > return True
>> >> >
>> >> > Here are some results in CPython for the following bench:
>> >> >
>> >> > from time import time
>> >> > n = 100000
>> >> > m = 100
>> >> > a = N.zeros(n)
>> >> > b = N.ones(n)
>> >> > t = time()
>> >> > for i in range(m):
>> >> > N.array_equal(a, b)
>> >> > print('classic numpy array_equal time elapsed (on different arrays):
>> >> > %0.5f'
>> >> > % (time()-t))
>> >> >
>> >> >
>> >> > t = time()
>> >> > for i in range(m):
>> >> > array_equal(a, b)
>> >> > print('Alternative array_equal time elapsed (on different arrays):
>> >> > %0.5f' %
>> >> > (time()-t))
>> >> >
>> >> > b = N.zeros(n)
>> >> >
>> >> > t = time()
>> >> > for i in range(m):
>> >> > N.array_equal(a, b)
>> >> > print('classic numpy array_equal time elapsed (on same arrays):
>> >> > %0.5f' %
>> >> > (time()-t))
>> >> >
>> >> > t = time()
>> >> > for i in range(m):
>> >> > array_equal(a, b)
>> >> > print('Alternative array_equal time elapsed (on same arrays): %0.5f'
>> >> > %
>> >> > (time()-t))
>> >> >
>> >> > CPython numpy results:
>> >> > classic numpy array_equal time elapsed (on different arrays): 0.07728
>> >> > Alternative array_equal time elapsed (on different arrays): 0.00056
>> >> > classic numpy array_equal time elapsed (on same arrays): 0.11163
>> >> > Alternative array_equal time elapsed (on same arrays): 9.09458
>> >> >
>> >> > PyPy results (cannot test on "classic" version because it depends on
>> >> > some
>> >> > funcs that are unavailable yet):
>> >> > Alternative array_equal time elapsed (on different arrays): 0.00133
>> >> > Alternative array_equal time elapsed (on same arrays): 0.95038
>> >> >
>> >> >
>> >> > So, as you see, even in CPython numpy my version is 138 times faster
>> >> > for
>> >> > different arrays (yet slower in 90 times for same arrays). However,
>> >> > in
>> >> > real
>> >> > world usually different arrays come to this func, and only sometimes
>> >> > similar
>> >> > arrays are encountered.
>> >> > Well, for my implementation for case of equal arrays time elapsed
>> >> > essentially depends on their size, but in either way I still think my
>> >> > implementation is better than CPython, - it's faster and doesn't
>> >> > require
>> >> > allocation of memory for the boolean array, that will go to the
>> >> > logical_and.
>> >> >
>> >> > I updated my array_equal implementation with the changes mentioned
>> >> > above,
>> >> > some tests on multidimensional arrays you've asked and put it in
>> >> > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry
>> >> > with
>> >> > the link).
>> >> >
>> >> >
>> >> > -----------------------
>> >> > Regards, D.
>> >> > http://openopt.org/Dmitrey
>> >> > _______________________________________________
>> >> > pypy-dev mailing list
>> >> > pypy-dev at python.org
>> >> > http://mail.python.org/mailman/listinfo/pypy-dev
>> >>
>> >> Worth pointing out that the implementation of array_equal and
>> >> array_equiv in NumPy are a bit embarrassing because they require a
>> >> full N comparisons instead of short-circuiting whenever a False value
>> >> is found. This is completely silly IMHO:
>> >>
>> >> In [34]: x = np.random.randn(100000)
>> >>
>> >> In [35]: y = np.random.randn(100000)
>> >>
>> >> In [36]: timeit np.array_equal(x, y)
>> >> 1000 loops, best of 3: 349 us per loop
>> >>
>> >> - W
>> >> _______________________________________________
>> >> pypy-dev mailing list
>> >> pypy-dev at python.org
>> >> http://mail.python.org/mailman/listinfo/pypy-dev
>> >
>> >
>> > The correct solution (IMO), is to reuse the original NumPy
>> > implementation,
>> > but have logical_and.reduce short circuit correctly. This has the nice
>> > side
>> > effect of allowing all() and any() to use logical_and/logical_or.reduce.
>> >
>> > Alx
>>
>> To do that, you're going to have to work around the eagerness of
>> Python-- it sort of makes me cringe to see you guys copying
>> eager-beaver NumPy when you have a wonderful opportunity to do
>> something better. Imagine if NumPy and APL/J/K had a lazy functional
>> lovechild implemented in PyPy. Though maybe you're already 10 steps
>> ahead of me.
>
>
> Well, you're the first person to ever express the sentiment that we should
> do something else :) But I think you'll be pleased, read on!
>>
>>
>> Hopefully you could make the JIT automatically take a simple array
>> expression like this:
>>
>> bool(logical_and.reduce(equal(a1,a2).ravel()))
>>
>> and examine the array expression and turn it into an ultra fast
>> functional expression that short circuits immediately:
>>
>> for x, y in zip(a1, a2):
>> if x != y:
>> return False
>> return True
>>
>> To do that you would need to make all your ufuncs return generators
>> instead of ndarrays. With the JIT infrastructure you could probably
>> make this work. If ever ufunc yields a generator you could build
>> functional array pipelines (now I'm talking like Peter Wang). But if
>> you insist on replicating C NumPy, well...
>>
>> W
>>
>> >
>> > --
>> > "I disapprove of what you say, but I will defend to the death your right
>> > to
>> > say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
>> > "The people's good is the highest law." -- Cicero
>> >
>
>
> These don't need to return generators, they just need to return things that
> look like ndarrays, but are internally lazy. And that's exactly what we do.
>
> Using .all() instead of logical_and.reduce() (since we don't have
> logical_and yet, and even if we did it wouldn't short circuit without some
> extra work), the JIT will generate almost exactly the code you posted
> (except in x86 :P).
>
> Alex
>
The main difference from generator is that if you access element, you
have to force the entire array. But this is not too bad since you're
mostly interested in all elements anyway. What's even more interesting
is that this gives (a yet untapped) potential for good vectorization.
Cheers,
fijal
More information about the pypy-dev
mailing list