in place list modification necessary? What's a better idiom?

Peter Otten __peter__ at web.de
Tue Apr 7 03:38:02 EDT 2009


MooMaster wrote:

> Now we can't calculate a meaningful Euclidean distance for something
> like "Iris-setosa" and "Iris-versicolor" unless we use string-edit
> distance or something overly complicated, so instead we'll use a
> simple quantization scheme of enumerating the set of values within the
> column domain and replacing the strings with numbers (i.e. Iris-setosa
> = 1, iris-versicolor=2).

I'd calculate the distance as

def string_dist(x, y, weight=1):
    return weight * (x == y)

You don't get a high resolution in that dimension, but you don't introduce
an element of randomness, either.

Peter




More information about the Python-list mailing list