[Numpy-discussion] improving arraysetops

Robert Cimrman cimrman3 at ntc.zcu.cz
Wed Jun 17 09:06:39 EDT 2009


Hi Neil,

Neil Crighton wrote:
>>> What about merging unique and unique1d?  They're essentially identical for an
>>> array input, but unique uses the builtin set() for non-array inputs and so is
>>> around 2x faster in this case - see below. Is it worth accepting a speed
>>> regression for unique to get rid of the function duplication?  (Or can they be
>>> combined?)
>> unique1d can return the indices - can this be achieved by using set(), too?
>>
> 
> No, set() can't return the indices as far as I know.
> 
>> The implementation for arrays is the same already, IMHO, so I would
>> prefer adding return_index, return_inverse to unique (automatically
>> converting input to array, if necessary), and deprecate unique1d.
>>
>> We can view it also as adding the set() approach to unique1d, when the
>> return_index, return_inverse arguments are not set, and renaming
>> unique1d -> unique.
>>
> 
> This sounds good. If you don't have time to do it, I don't mind having
> a go at writing
> a patch to implement these changes (deprecate the existing unique1d, rename
> unique1d to unique and add the set approach from the old unique, and the other
> changes mentioned in http://projects.scipy.org/numpy/ticket/1133).

That would be really great - I will not be online starting tomorrow till 
the end of next week (more or less), so I can really look at the issue 
after I return.

[...]
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in
>> position 28: ordinal not in range(128)
>>
>> It disappears after increasing the array size, or the integer size.
>> In [39]: np.__version__
>> Out[39]: '1.4.0.dev7047'
>>
>> r.
> 
> Weird! From the error message, it looks like a problem with ipython's timeit
> function rather than unique. I can't reproduce it on my machine
> (numpy 1.4.0.dev, r7059;   IPython 0.10.bzr.r1163 ).

True, I have ipython 0.9.1, that might cause the problem.

cheers,
r.




More information about the NumPy-Discussion mailing list