[pypy-dev] certificate for accepting numpypy new funcs?

Wes McKinney wesmckinn at gmail.com
Fri Jan 20 01:25:11 CET 2012


On Thu, Jan 19, 2012 at 7:20 PM, Alex Gaynor <alex.gaynor at gmail.com> wrote:
>
>
> On Thu, Jan 19, 2012 at 6:15 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>
>> On Thu, Jan 19, 2012 at 2:49 PM, Dmitrey <dmitrey15 at ukr.net> wrote:
>> > On 01/19/2012 07:31 PM, Maciej Fijalkowski wrote:
>> >>
>> >> On Thu, Jan 19, 2012 at 6:46 PM, Dmitrey<dmitrey15 at ukr.net>  wrote:
>> >>>
>> >>> Hi all,
>> >>> could you provide clarification to numpypy new funcs accepting (not
>> >>> only
>> >>> for
>> >>> me, but for any other possible volunteers)?
>> >>> The doc I've been directed says only "You have to test exhaustively
>> >>> your
>> >>> module", while I would like to know more explicit rules.
>> >>> For example, "at least 3 tests per func" (however, I guess for funcs
>> >>> of
>> >>> different complexity and variability number of tests also should
>> >>> expected
>> >>> to
>> >>> be different).
>> >>> Also, are there any strict rules for the testcases to be submitted, or
>> >>> I,
>> >>> for example, can mere write
>> >>>
>> >>> if __name__ == '__main__':
>> >>>    assert array_equal(1, 1)
>> >>>    assert array_equal([1, 2], [1, 2])
>> >>>    assert array_equal(N.array([1, 2]), N.array([1, 2]))
>> >>>    assert array_equal([1, 2], N.array([1, 2]))
>> >>>    assert array_equal([1, 2], [1, 2, 3]) is False
>> >>>    print('passed')
>> >>
>> >> We have pretty exhaustive automated testing suites. Look for example
>> >> in pypy/module/micronumpy/test directory for the test file style.
>> >> They're run with py.test and we require at the very least full code
>> >> coverage (every line has to be executed, there are tools to check,
>> >> like coverage). Also passing "unusual" input, like sys.maxint  etc. is
>> >> usually recommended. With your example, you would check if it works
>> >> for say views and multidimensional arrays. Also "is False" is not
>> >> considered good style.
>> >>
>> >>> Or there is a certain rule for storing files with tests?
>> >>>
>> >>> If I or someone else will submit a func with some tests like in the
>> >>> example
>> >>> above, will you put the func and tests in the proper files by
>> >>> yourself?
>> >>> I'm
>> >>> not lazy to go for it by myself, but I mere no merged enough into
>> >>> numpypy
>> >>> dev process, including mercurial branches and numpypy files structure,
>> >>> and
>> >>> can spend only quite limited time for diving into it in nearest
>> >>> future.
>> >>
>> >> We generally require people to put their own tests as they go with the
>> >> code (in appropriate places) because you also should not break
>> >> anything. The usefullness of a patch that has to be sliced and diced
>> >> and put into places is very limited and for straightforward
>> >> mostly-copied code, like array_equal, plain useless, since it's almost
>> >> as much work to just do it.
>> >
>> > Well, for this func (array_equal) my docstrings really were copied from
>> > cpython numpy (why wouln't do this to save some time, while license
>> > allows
>> > it?), but
>> > * why would'n go for this (), while other programmers are busy by other
>> > tasks?
>> > * engines of my and CPython numpy funcs complitely differs. At first, in
>> > PyPy the CPython code just doesn't work at all (because of the problem
>> > with
>> > ndarray.flat). At 2nd, I have implemented walkaround - just replaced
>> > some
>> > code lines by
>> >    Size = a1.size
>> >    f1, f2 = a1.flat, a2.flat
>> >    # TODO: replace xrange by range in Python3
>> >    for i in xrange(Size):
>> >        if f1.next() != f2.next(): return False
>> >    return True
>> >
>> > Here are some results in CPython for the following bench:
>> >
>> > from time import time
>> > n = 100000
>> > m = 100
>> > a = N.zeros(n)
>> > b = N.ones(n)
>> > t = time()
>> > for i in range(m):
>> >    N.array_equal(a, b)
>> > print('classic numpy array_equal time elapsed (on different arrays):
>> > %0.5f'
>> > % (time()-t))
>> >
>> >
>> > t = time()
>> > for i in range(m):
>> >    array_equal(a, b)
>> > print('Alternative array_equal time elapsed (on different arrays):
>> > %0.5f' %
>> > (time()-t))
>> >
>> > b = N.zeros(n)
>> >
>> > t = time()
>> > for i in range(m):
>> >    N.array_equal(a, b)
>> > print('classic numpy array_equal time elapsed (on same arrays): %0.5f' %
>> > (time()-t))
>> >
>> > t = time()
>> > for i in range(m):
>> >    array_equal(a, b)
>> > print('Alternative array_equal time elapsed (on same arrays): %0.5f' %
>> > (time()-t))
>> >
>> > CPython numpy results:
>> > classic numpy array_equal time elapsed (on different arrays): 0.07728
>> > Alternative array_equal time elapsed (on different arrays): 0.00056
>> > classic numpy array_equal time elapsed (on same arrays): 0.11163
>> > Alternative array_equal time elapsed (on same arrays): 9.09458
>> >
>> > PyPy results (cannot test on "classic" version because it depends on
>> > some
>> > funcs that are unavailable yet):
>> > Alternative array_equal time elapsed (on different arrays): 0.00133
>> > Alternative array_equal time elapsed (on same arrays): 0.95038
>> >
>> >
>> > So, as you see, even in CPython numpy my version is 138 times faster for
>> > different arrays (yet slower in 90 times for same arrays). However, in
>> > real
>> > world usually different arrays come to this func, and only sometimes
>> > similar
>> > arrays are encountered.
>> > Well, for my implementation for case of equal arrays time elapsed
>> > essentially depends on their size, but in either way I still think my
>> > implementation is better than CPython, - it's faster and doesn't require
>> > allocation of memory for the boolean array, that will go to the
>> > logical_and.
>> >
>> > I updated my array_equal implementation with the changes mentioned
>> > above,
>> > some tests on multidimensional arrays you've asked and put it in
>> > http://pastebin.com/tg2aHE6x (now I'll update the bugs.pypy.org entry
>> > with
>> > the link).
>> >
>> >
>> > -----------------------
>> > Regards, D.
>> > http://openopt.org/Dmitrey
>> > _______________________________________________
>> > pypy-dev mailing list
>> > pypy-dev at python.org
>> > http://mail.python.org/mailman/listinfo/pypy-dev
>>
>> Worth pointing out that the implementation of array_equal and
>> array_equiv in NumPy are a bit embarrassing because they require a
>> full N comparisons instead of short-circuiting whenever a False value
>> is found. This is completely silly IMHO:
>>
>> In [34]: x = np.random.randn(100000)
>>
>> In [35]: y = np.random.randn(100000)
>>
>> In [36]: timeit np.array_equal(x, y)
>> 1000 loops, best of 3: 349 us per loop
>>
>> - W
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev at python.org
>> http://mail.python.org/mailman/listinfo/pypy-dev
>
>
> The correct solution (IMO), is to reuse the original NumPy implementation,
> but have logical_and.reduce short circuit correctly.  This has the nice side
> effect of allowing all() and any() to use logical_and/logical_or.reduce.
>
> Alx

To do that, you're going to have to work around the eagerness of
Python-- it sort of makes me cringe to see you guys copying
eager-beaver NumPy when you have a wonderful opportunity to do
something better. Imagine if NumPy and APL/J/K had a lazy functional
lovechild implemented in PyPy. Though maybe you're already 10 steps
ahead of me.

Hopefully you could make the JIT automatically take a simple array
expression like this:

bool(logical_and.reduce(equal(a1,a2).ravel()))

and examine the array expression and turn it into an ultra fast
functional expression that short circuits immediately:

for x, y in zip(a1, a2):
    if x != y:
        return False
return True

To do that you would need to make all your ufuncs return generators
instead of ndarrays. With the JIT infrastructure you could probably
make this work. If ever ufunc yields a generator you could build
functional array pipelines (now I'm talking like Peter Wang). But if
you insist on replicating C NumPy, well...

W

>
> --
> "I disapprove of what you say, but I will defend to the death your right to
> say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
> "The people's good is the highest law." -- Cicero
>


More information about the pypy-dev mailing list