[Python-ideas] string method count()

Wes Turner wes.turner at gmail.com
Thu Apr 26 15:13:10 EDT 2018


If this was for a school assignment, I'd probably go to edit distance and
fuzzy string match next:
https://en.wikipedia.org/wiki/Edit_distance
https://en.wikipedia.org/wiki/String-to-string_correction_problem

- https://pypi.org/search/?q=Levenshtein
  - https://pypi.org/project/textdistance/

As a bioinformatics program, this is a bit like CRISPR:
https://en.wikipedia.org/wiki/CRISPR

BioPython Seq has a count_overlap method with a BSD 3-Clause LICENSE:
https://github.com/biopython/biopython/blob/master/LICENSE.rst

Can it be made faster with e.g. itertools.count and a generator
comprehension?

- Bio.Seq.Seq.count_overlap()
  http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#count_overlap

Are there any changes or features necessary in core Python in order to
finish this application?
If not, the python-tutor mailing list or r/learnpython are set up to handle
this sort of thing.

It may or may not be appropriate for core Python to support all of these
string algorithms:
http://rosalind.info/problems/topics/string-algorithms/

On Thursday, April 26, 2018, Julia Kim <julia.hiyeon.kim at gmail.com> wrote:

> There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting
> from 1.
>
> If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’
> becomes ‘BANANA ‘.
>
> If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes
> ‘APPLE’.
>
> Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or
> ‘APPLE ‘, two different results.
>
>
> I wrote a program which edits a part of a text. If the part to be edited
> occurs more than once, it presents the positions and asks the user to
> choose which one to be edited.
>
> I tried with different algorithms. Best one so far would be using just
> find() and collecting the results in a list.
>
>
>
> On Apr 25, 2018, at 11:57 PM, Wes Turner <wes.turner at gmail.com> wrote:
>
>
>
> On Wednesday, April 25, 2018, Steven D'Aprano <steve at pearwood.info> wrote:
>
>> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
>> > Hi,
>> >
>> > There’s an error with the string method count().
>> >
>> > x = ‘AAA’
>> > y = ‘AA’
>> > print(x.count(y))
>> >
>> > The output is 1, instead of 2.
>>
>> Are you proposing that there ought to be a version of count that looks
>> for *overlapping* substrings?
>>
>> When will this be useful?
>
>
> "Finding a motif in DNA"
> http://rosalind.info/problems/subs/
>
> This is possible with re.find, re.finditer, re.findall, regex.findall(,
> overlapped=True), sliding window
> https://stackoverflow.com/questions/2970520/string-count-with-overlapping-
> occurrences
>
> n-grams can be by indices or by value.
> count = len(indices)
> https://en.wikipedia.org/wiki/N-gram#Examples
>
> https://en.wikipedia.org/wiki/String_(computer_science)#
> String_processing_algorithms
>
> https://en.wikipedia.org/wiki/Sequential_pattern_mining
>
>
>>
>> --
>> Steve
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180426/8fe8d00d/attachment.html>


More information about the Python-ideas mailing list