string method count()

Hi, There’s an error with the string method count(). x = ‘AAA’ y = ‘AA’ print(x.count(y)) The output is 1, instead of 2. I write programs on SoloLearn mobile app. Warm regards, Julia Kim

Hi,
From https://docs.python.org/3/library/stdtypes.html#str.count: str.count(*sub*[, *start*[, *end*]])
Return the number of *non-overlapping* occurrences of substring *sub* in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. Best regards, João Santos On Wed, 25 Apr 2018 at 20:22 Julia Kim <julia.hiyeon.kim@gmail.com> wrote:

On Wednesday, April 25, 2018, Steven D'Aprano <steve@pearwood.info> wrote:
"Finding a motif in DNA" http://rosalind.info/problems/subs/ This is possible with re.find, re.finditer, re.findall, regex.findall(, overlapped=True), sliding window https://stackoverflow.com/questions/2970520/string-count-with-overlapping-oc... n-grams can be by indices or by value. count = len(indices) https://en.wikipedia.org/wiki/N-gram#Examples https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_al... https://en.wikipedia.org/wiki/Sequential_pattern_mining

or build it yourself... def str_count(string, sub): c = 0 for c in range(len(string)-len(sub)): if string[c:].startswith(sub): c += 1 return c (probably some optimizations possible...) Or in one line with a generator expression: def str_count(string, sub): return sum(string[c:].startswith(sub) for c in range(len(string)-len(sub))) regular expressions would probably be at least an order of magnitude better in speed, if it's a bottleneck to you. But pure python implementation for this is a lot easier than it would be for the current string.count(). 2018-04-26 8:57 GMT+02:00 Wes Turner <wes.turner@gmail.com>:

Regular expressions are not just "an order of magnitude better"—they're asymptotically faster. See https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm for a non-regular-expression algorithm. On Thursday, April 26, 2018 at 5:45:20 AM UTC-4, Jacco van Dorp wrote:

There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1. If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes ‘BANANA ‘. If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes ‘APPLE’. Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, two different results. I wrote a program which edits a part of a text. If the part to be edited occurs more than once, it presents the positions and asks the user to choose which one to be edited. I tried with different algorithms. Best one so far would be using just find() and collecting the results in a list.

If this was for a school assignment, I'd probably go to edit distance and fuzzy string match next: https://en.wikipedia.org/wiki/Edit_distance https://en.wikipedia.org/wiki/String-to-string_correction_problem - https://pypi.org/search/?q=Levenshtein - https://pypi.org/project/textdistance/ As a bioinformatics program, this is a bit like CRISPR: https://en.wikipedia.org/wiki/CRISPR BioPython Seq has a count_overlap method with a BSD 3-Clause LICENSE: https://github.com/biopython/biopython/blob/master/LICENSE.rst Can it be made faster with e.g. itertools.count and a generator comprehension? - Bio.Seq.Seq.count_overlap() http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#count_overlap Are there any changes or features necessary in core Python in order to finish this application? If not, the python-tutor mailing list or r/learnpython are set up to handle this sort of thing. It may or may not be appropriate for core Python to support all of these string algorithms: http://rosalind.info/problems/topics/string-algorithms/ On Thursday, April 26, 2018, Julia Kim <julia.hiyeon.kim@gmail.com> wrote:

Hi,
From https://docs.python.org/3/library/stdtypes.html#str.count: str.count(*sub*[, *start*[, *end*]])
Return the number of *non-overlapping* occurrences of substring *sub* in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. Best regards, João Santos On Wed, 25 Apr 2018 at 20:22 Julia Kim <julia.hiyeon.kim@gmail.com> wrote:

On Wednesday, April 25, 2018, Steven D'Aprano <steve@pearwood.info> wrote:
"Finding a motif in DNA" http://rosalind.info/problems/subs/ This is possible with re.find, re.finditer, re.findall, regex.findall(, overlapped=True), sliding window https://stackoverflow.com/questions/2970520/string-count-with-overlapping-oc... n-grams can be by indices or by value. count = len(indices) https://en.wikipedia.org/wiki/N-gram#Examples https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_al... https://en.wikipedia.org/wiki/Sequential_pattern_mining

or build it yourself... def str_count(string, sub): c = 0 for c in range(len(string)-len(sub)): if string[c:].startswith(sub): c += 1 return c (probably some optimizations possible...) Or in one line with a generator expression: def str_count(string, sub): return sum(string[c:].startswith(sub) for c in range(len(string)-len(sub))) regular expressions would probably be at least an order of magnitude better in speed, if it's a bottleneck to you. But pure python implementation for this is a lot easier than it would be for the current string.count(). 2018-04-26 8:57 GMT+02:00 Wes Turner <wes.turner@gmail.com>:

Regular expressions are not just "an order of magnitude better"—they're asymptotically faster. See https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm for a non-regular-expression algorithm. On Thursday, April 26, 2018 at 5:45:20 AM UTC-4, Jacco van Dorp wrote:

There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1. If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes ‘BANANA ‘. If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes ‘APPLE’. Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, two different results. I wrote a program which edits a part of a text. If the part to be edited occurs more than once, it presents the positions and asks the user to choose which one to be edited. I tried with different algorithms. Best one so far would be using just find() and collecting the results in a list.

If this was for a school assignment, I'd probably go to edit distance and fuzzy string match next: https://en.wikipedia.org/wiki/Edit_distance https://en.wikipedia.org/wiki/String-to-string_correction_problem - https://pypi.org/search/?q=Levenshtein - https://pypi.org/project/textdistance/ As a bioinformatics program, this is a bit like CRISPR: https://en.wikipedia.org/wiki/CRISPR BioPython Seq has a count_overlap method with a BSD 3-Clause LICENSE: https://github.com/biopython/biopython/blob/master/LICENSE.rst Can it be made faster with e.g. itertools.count and a generator comprehension? - Bio.Seq.Seq.count_overlap() http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#count_overlap Are there any changes or features necessary in core Python in order to finish this application? If not, the python-tutor mailing list or r/learnpython are set up to handle this sort of thing. It may or may not be appropriate for core Python to support all of these string algorithms: http://rosalind.info/problems/topics/string-algorithms/ On Thursday, April 26, 2018, Julia Kim <julia.hiyeon.kim@gmail.com> wrote:
participants (7)
-
Alexandre Brault
-
Jacco van Dorp
-
João Santos
-
Julia Kim
-
Neil Girdhar
-
Steven D'Aprano
-
Wes Turner