rgaddi at technologyhighland.invalid
Tue Sep 23 18:32:44 CEST 2014
On Tue, 23 Sep 2014 05:34:19 -0700 (PDT)
Miki Tebeka <miki.tebeka at gmail.com> wrote:
> Before I start writing my own. Is there something like collections.Counter (fore frequencies) that does "fuzzy" matching?
> Meaning x is considered equal to y if abs(x - y) < epsilon. (x, y and my case will be numpy.array).
You'll probably have to write that yourself. While you're at it, think
long and hard about that definition of fuzziness. If you can make it
closer to the concept of histogram "bins" you'll get much better
performance. If, for instance, you can live with 1.anything being one
bin, and 2.anything being the next (even though this puts 1.999 and
2.000 into separate bins) then you can just floor() the number and use
the result as the key.
If you really need the continuous fuzziness you'll probably have to
keep the potential keys sorted and run a binary search against the list
of them to find the nearest. This will still run you into some
problems based on ordering. For instance, with an epsilon of 0.1,
you'd put 2.0 into a bin, then 1.9 into the same bin, then 2.1 into the
same bin. If however you put 1.9 into that bin first, then 2.0 would
go into that bin, but 2.1 would go into a different one.
TL;DR you need to think very hard about your problem definition and
what you want to happen before you actually try to implement this.
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
More information about the Python-list