frequently-used algorithms for string edit distance: Levenshtein & Damerau Levenshtein distance Jaro & Jaro-Winkler distance N-Gram distance -- Rain Chen Sent with Airmail On September 16, 2019 at 4:26:53 AM, Todorov Alexander (atodorov@mrsenko.com) wrote: Hi folks, I am looking for some tool (or algorithm which I can implement at the worst) which calculates similarities between strings. I would turn this into a pylint plugin b/c this is how I would consume it in my projects. My background is that we've identified *duplicate* or *similar* strings in our project which are marked for translation. Some of these are upper case vs. lower case and all of the variations between (I can lower case everything before sending to the tool of course), variances in spelling, e.g. "test case" vs "TestCase", variations into how certain words/combination of words are used together in a sentence, e.g. "user does not exist" vs. "the user specified was not found". Ideally I'd like to consume this tool in CI and based on the results reduce the number of source strings needed for translation and make life for translators easier. Feel free to propose anything, I have not done any research on this topic. Thanks, Alex _______________________________________________ code-quality mailing list -- code-quality@python.org To unsubscribe send an email to code-quality-leave@python.org https://mail.python.org/mailman3/lists/code-quality.python.org/