[Numpy-discussion] improving arraysetops
Robert Cimrman
cimrman3 at ntc.zcu.cz
Mon Jun 15 05:55:11 EDT 2009
Neil Crighton wrote:
> Robert Cimrman <cimrman3 <at> ntc.zcu.cz> writes:
>
>> Hi,
>>
>> I am starting a new thread, so that it reaches the interested people.
>> Let us discuss improvements to arraysetops (array set operations) at [1]
>> (allowing non-unique arrays as function arguments, better naming
>> conventions and documentation).
>>
>> r.
>>
>> [1] http://projects.scipy.org/numpy/ticket/1133
>>
>
> Hi,
>
> These changes looks good to me. For point (1) I think we should fold the
> unique and _nu code into a single function. For point (3) I like in1d - it's
> shorter than isin1d but is still clear.
yes, the _nu functions will be useless then, their bodies can be moved
into the generic functions.
> What about merging unique and unique1d? They're essentially identical for an
> array input, but unique uses the builtin set() for non-array inputs and so is
> around 2x faster in this case - see below. Is it worth accepting a speed
> regression for unique to get rid of the function duplication? (Or can they be
> combined?)
unique1d can return the indices - can this be achieved by using set(), too?
The implementation for arrays is the same already, IMHO, so I would
prefer adding return_index, return_inverse to unique (automatically
converting input to array, if necessary), and deprecate unique1d.
We can view it also as adding the set() approach to unique1d, when the
return_index, return_inverse arguments are not set, and renaming
unique1d -> unique.
> Neil
>
>
> In [24]: l = list(np.random.randint(100, size=10000))
> In [25]: %timeit np.unique1d(l)
> 1000 loops, best of 3: 1.9 ms per loop
> In [26]: %timeit np.unique(l)
> 1000 loops, best of 3: 793 µs per loop
> In [27]: l = list(np.random.randint(100, size=1000000))
> In [28]: %timeit np.unique(l)
> 10 loops, best of 3: 78 ms per loop
> In [29]: %timeit np.unique1d(l)
> 10 loops, best of 3: 233 ms per loop
I have found a strange bug in unique():
In [24]: l = list(np.random.randint(100, size=1000))
In [25]: %timeit np.unique(l)
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
/usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s)
951 else:
952 magic_args = self.var_expand(magic_args,1)
--> 953 return fn(magic_args)
954
955 def ipalias(self,arg_s):
/usr/lib64/python2.5/site-packages/IPython/Magic.py in
magic_timeit(self, parameter_s)
1829
precision,
1830 best
* scaling[order],
-> 1831
units[order])
1832 if tc > tc_min:
1833 print "Compiler time: %.2f s" % tc
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in
position 28: ordinal not in range(128)
It disappears after increasing the array size, or the integer size.
In [39]: np.__version__
Out[39]: '1.4.0.dev7047'
r.
More information about the NumPy-Discussion
mailing list