need some kind of "coherence index" for a group of strings
duncan smith
duncan at invalid.invalid
Fri Nov 4 10:07:07 EDT 2016
On 03/11/16 16:18, Fillmore wrote:
>
> Hi there, apologies for the generic question. Here is my problem let's
> say that I have a list of lists of strings.
>
> list1: #strings are sort of similar to one another
>
> my_nice_string_blabla
> my_nice_string_blqbli
> my_nice_string_bl0bla
> my_nice_string_aru
>
>
> list2: #strings are mostly different from one another
>
> my_nice_string_blabla
> some_other_string
> yet_another_unrelated string
> wow_totally_different_from_others_too
>
>
> I would like an algorithm that can look at the strings and determine
> that strings in list1 are sort of similar to one another, while the
> strings in list2 are all different.
> Ideally, it would be nice to have some kind of 'coherence index' that I
> can exploit to separate lists given a certain threshold.
>
> I was about to concoct something using levensthein distance, but then I
> figured that it would be expensive to compute and I may be reinventing
> the wheel.
>
> Thanks in advance to python masters that may have suggestions...
>
>
>
https://pypi.python.org/pypi/jellyfish/
Duncan
More information about the Python-list
mailing list