Mailman 3 string method count() - Python-ideas

newer
Re: [Python-ideas] Delivery Status...

string method count()

older
Re: [Python-ideas] Python-ideas...

Julia Kim

April 25, 2018

11:22 a.m.

Hi, There’s an error with the string method count(). x = ‘AAA’ y = ‘AA’ print(x.count(y)) The output is 1, instead of 2. I write programs on SoloLearn mobile app. Warm regards, Julia Kim

Show replies by date

Alexandre Brault

April 2018

11:27 a.m.

str.count counts non-overlapping instances of the substring. After counting the first 'AA', there is only one A left, so that isn't a second instance of 'AA' On 2018-04-25 02:22 PM, Julia Kim wrote:

...

João Santos

11:31 a.m.

Hi,

...

From https://docs.python.org/3/library/stdtypes.html#str.count: str.count(*sub*[, *start*[, *end*]])

Return the number of *non-overlapping* occurrences of substring *sub* in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. Best regards, João Santos On Wed, 25 Apr 2018 at 20:22 Julia Kim <julia.hiyeon.kim@gmail.com> wrote:

...

Steven D'Aprano

2:33 p.m.

On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:

...

Are you proposing that there ought to be a version of count that looks for *overlapping* substrings? When will this be useful? -- Steve

Wes Turner

11:57 p.m.

On Wednesday, April 25, 2018, Steven D'Aprano <steve@pearwood.info> wrote:

...

"Finding a motif in DNA" http://rosalind.info/problems/subs/ This is possible with re.find, re.finditer, re.findall, regex.findall(, overlapped=True), sliding window https://stackoverflow.com/questions/2970520/string-count-with-overlapping-oc... n-grams can be by indices or by value. count = len(indices) https://en.wikipedia.org/wiki/N-gram#Examples https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_al... https://en.wikipedia.org/wiki/Sequential_pattern_mining

...

Jacco van Dorp

2:44 a.m.

or build it yourself... def str_count(string, sub): c = 0 for c in range(len(string)-len(sub)): if string[c:].startswith(sub): c += 1 return c (probably some optimizations possible...) Or in one line with a generator expression: def str_count(string, sub): return sum(string[c:].startswith(sub) for c in range(len(string)-len(sub))) regular expressions would probably be at least an order of magnitude better in speed, if it's a bottleneck to you. But pure python implementation for this is a lot easier than it would be for the current string.count(). 2018-04-26 8:57 GMT+02:00 Wes Turner <wes.turner@gmail.com>:

...

Neil Girdhar

May 2018

12:56 p.m.

Regular expressions are not just "an order of magnitude better"—they're asymptotically faster. See https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm for a non-regular-expression algorithm. On Thursday, April 26, 2018 at 5:45:20 AM UTC-4, Jacco van Dorp wrote:

...

Julia Kim

April 2018

7:19 a.m.

There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1. If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes ‘BANANA ‘. If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes ‘APPLE’. Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, two different results. I wrote a program which edits a part of a text. If the part to be edited occurs more than once, it presents the positions and asks the user to choose which one to be edited. I tried with different algorithms. Best one so far would be using just find() and collecting the results in a list.

...

Wes Turner

12:13 p.m.

If this was for a school assignment, I'd probably go to edit distance and fuzzy string match next: https://en.wikipedia.org/wiki/Edit_distance https://en.wikipedia.org/wiki/String-to-string_correction_problem - https://pypi.org/search/?q=Levenshtein - https://pypi.org/project/textdistance/ As a bioinformatics program, this is a bit like CRISPR: https://en.wikipedia.org/wiki/CRISPR BioPython Seq has a count_overlap method with a BSD 3-Clause LICENSE: https://github.com/biopython/biopython/blob/master/LICENSE.rst Can it be made faster with e.g. itertools.count and a generator comprehension? - Bio.Seq.Seq.count_overlap() http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#count_overlap Are there any changes or features necessary in core Python in order to finish this application? If not, the python-tutor mailing list or r/learnpython are set up to handle this sort of thing. It may or may not be appropriate for core Python to support all of these string algorithms: http://rosalind.info/problems/topics/string-algorithms/ On Thursday, April 26, 2018, Julia Kim <julia.hiyeon.kim@gmail.com> wrote:

...

There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1.

If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes ‘BANANA ‘.

If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes ‘APPLE’.

Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, two different results.

I wrote a program which edits a part of a text. If the part to be edited occurs more than once, it presents the positions and asks the user to choose which one to be edited.

I tried with different algorithms. Best one so far would be using just find() and collecting the results in a list.

On Apr 25, 2018, at 11:57 PM, Wes Turner <wes.turner@gmail.com> wrote:

On Wednesday, April 25, 2018, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:

...
Hi,

There’s an error with the string method count().

x = ‘AAA’ y = ‘AA’ print(x.count(y))

The output is 1, instead of 2.

Are you proposing that there ought to be a version of count that looks for *overlapping* substrings?

When will this be useful?

"Finding a motif in DNA" http://rosalind.info/problems/subs/

This is possible with re.find, re.finditer, re.findall, regex.findall(, overlapped=True), sliding window https://stackoverflow.com/questions/2970520/string-count-with-overlapping- occurrences

n-grams can be by indices or by value. count = len(indices) https://en.wikipedia.org/wiki/N-gram#Examples

https://en.wikipedia.org/wiki/String_(computer_science)# String_processing_algorithms

https://en.wikipedia.org/wiki/Sequential_pattern_mining

...
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Alexandre Brault

April 2018

6:27 p.m.

str.count counts non-overlapping instances of the substring. After counting the first 'AA', there is only one A left, so that isn't a second instance of 'AA' On 2018-04-25 02:22 PM, Julia Kim wrote:

...

João Santos

6:31 p.m.

Hi,

...

From https://docs.python.org/3/library/stdtypes.html#str.count: str.count(*sub*[, *start*[, *end*]])

...

Steven D'Aprano

9:33 p.m.

On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:

...

Are you proposing that there ought to be a version of count that looks for *overlapping* substrings? When will this be useful? -- Steve

Wes Turner

6:57 a.m.

On Wednesday, April 25, 2018, Steven D'Aprano <steve@pearwood.info> wrote:

...

Jacco van Dorp

9:44 a.m.

...

Neil Girdhar

May 2018

7:56 p.m.

...

Julia Kim

April 2018

2:19 p.m.

...

Wes Turner

7:13 p.m.

...

There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1.

If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes ‘BANANA ‘.

If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes ‘APPLE’.

Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, two different results.

I wrote a program which edits a part of a text. If the part to be edited occurs more than once, it presents the positions and asks the user to choose which one to be edited.

I tried with different algorithms. Best one so far would be using just find() and collecting the results in a list.

On Apr 25, 2018, at 11:57 PM, Wes Turner <wes.turner@gmail.com> wrote:

On Wednesday, April 25, 2018, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:

...
Hi,

There’s an error with the string method count().

x = ‘AAA’ y = ‘AA’ print(x.count(y))

The output is 1, instead of 2.

Are you proposing that there ought to be a version of count that looks for *overlapping* substrings?

When will this be useful?

"Finding a motif in DNA" http://rosalind.info/problems/subs/

This is possible with re.find, re.finditer, re.findall, regex.findall(, overlapped=True), sliding window https://stackoverflow.com/questions/2970520/string-count-with-overlapping- occurrences

n-grams can be by indices or by value. count = len(indices) https://en.wikipedia.org/wiki/N-gram#Examples

https://en.wikipedia.org/wiki/String_(computer_science)# String_processing_algorithms

https://en.wikipedia.org/wiki/Sequential_pattern_mining

...
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

2508

Age (days ago)

2520

Last active (days ago)

List overview

Download

8 comments

7 participants

participants (7)

Alexandre Brault
Jacco van Dorp
João Santos
Julia Kim
Neil Girdhar
Steven D'Aprano
Wes Turner

string method count()

Julia Kim

Jacco van Dorp

Julia Kim

Jacco van Dorp

Julia Kim

tags

participants (7)