I'd guess what you're looking for is Levenstein distance.

On 15/09/2019 22:26:56, Alexander Todorov <atodorov@mrsenko.com> wrote:

Hi folks,
I am looking for some tool (or algorithm which I can implement at the worst)
which calculates similarities between strings. I would turn this into a pylint
plugin b/c this is how I would consume it in my projects.

My background is that we've identified *duplicate* or *similar* strings in our
project which are marked for translation. Some of these are upper case vs. lower
case and all of the variations between (I can lower case everything before
sending to the tool of course), variances in spelling, e.g. "test case" vs
"TestCase", variations into how certain words/combination of words are used
together in a sentence, e.g. "user does not exist" vs. "the user specified was
not found".

Ideally I'd like to consume this tool in CI and based on the results reduce the
number of source strings needed for translation and make life for translators

Feel free to propose anything, I have not done any research on this topic.

code-quality mailing list -- code-quality@python.org
To unsubscribe send an email to code-quality-leave@python.org