If this was for a school assignment, I'd probably go to edit distance and fuzzy string match next:
https://en.wikipedia.org/wiki/Edit_distance
https://en.wikipedia.org/wiki/String-to-string_correction_problem

https://pypi.org/search/?q=Levenshtein
  - https://pypi.org/project/textdistance/

As a bioinformatics program, this is a bit like CRISPR:
https://en.wikipedia.org/wiki/CRISPR

BioPython Seq has a count_overlap method with a BSD 3-Clause LICENSE:
https://github.com/biopython/biopython/blob/master/LICENSE.rst

Can it be made faster with e.g. itertools.count and a generator comprehension?

- Bio.Seq.Seq.count_overlap()
  http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#count_overlap

Are there any changes or features necessary in core Python in order to finish this application?
If not, the python-tutor mailing list or r/learnpython are set up to handle this sort of thing. 

It may or may not be appropriate for core Python to support all of these string algorithms:
http://rosalind.info/problems/topics/string-algorithms/

On Thursday, April 26, 2018, Julia Kim <julia.hiyeon.kim@gmail.com> wrote:
There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1.

If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes ‘BANANA ‘.

If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes ‘APPLE’.

Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, two different results.


I wrote a program which edits a part of a text. If the part to be edited occurs more than once, it presents the positions and asks the user to choose which one to be edited. 

I tried with different algorithms. Best one so far would be using just find() and collecting the results in a list.



On Apr 25, 2018, at 11:57 PM, Wes Turner <wes.turner@gmail.com> wrote:



On Wednesday, April 25, 2018, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
> Hi,
>
> There’s an error with the string method count().
>
> x = ‘AAA’
> y = ‘AA’
> print(x.count(y))
>
> The output is 1, instead of 2.

Are you proposing that there ought to be a version of count that looks
for *overlapping* substrings?

When will this be useful?

"Finding a motif in DNA"
 
This is possible with re.find, re.finditer, re.findall, regex.findall(, overlapped=True), sliding window

n-grams can be by indices or by value.
count = len(indices)





--
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/