[Tutor] How to find optimisations for code

Avi Gross avigross at verizon.net
Sat Oct 20 12:16:56 EDT 2018


There are many ways to do some things. I present a fairly straightforward
method for consideration. 

 

Synopsis. Use two dictionaries to keep track of how many times a letter can
be found in the source word and the target word.

 

Loop over the target letters and as soon as you find a letter that needs
more copies of that letter than are available in the source, return False.

 

Otherwise, return True.

 

Slight optimization, return False immediately if the target is too long.

 

Note: Remove the print statements used to show what it does when trying for
efficiency.

 

Note I do not claim this is fast. But for longer strings this requires one
pass over each string to set the dictionary counters and often less than one
pass over the target string to determine Truth. I can visualize many
variants but efficiency of each may vary with the data, version of Python
and so on. 

 

Python CODE using version 3.7.0 :

 

"""

Function to check if string str_target is a strict subset

of str_source - but with multiple copies of a letter being unique

in what is thus not a set. In English, can the letters in str_target

be found in the required quantities with possibly some left over

when examining str_source. For example, 'level' has two l's and two e's

and can be made from 'multilevel' but not from 'evil' which has the needed

e v and l but not two copies.

"""

 

def sub_word(str_source, str_target):

    """

    Return True/False if letters in source are enough to make target

    Method: make dictionaries of number of letters in each. See if enough.

    """

 

    # Check for trivial False condition: word too long

    if len(str_target) > len(str_source) : return False

 

    # Initialize empty dictionaries to hold leter counts.

    source = {}

    target = {}

 

    # Count letters in source thgen target, initializing

    # to zero if not yet present.

    for letter in str_source.lower():

           source[letter] = source.get(letter, 0) + 1

 

           

    for letter in str_target.lower():

           target[letter] = target.get(letter, 0) + 1

 

    # During bebug phase, show what the dictionaries contain

    print("SOURCE dict:", source)

    print("TARGET dict:", target)

 

    # Iterate over letters in target and return False

    # if the number of instances needed exceeds what

    # can be found in the source.

    for key in target:

        if target[key] > source.get(key, 0) : return False

 

    # if you reached here, return True as all letters needed

    # have been found.

    return True

 

Examples of use:

 

>>> sub_word("multilevel", "level")

SOURCE dict: {'m': 1, 'u': 1, 'l': 3, 't': 1, 'i': 1, 'e': 2, 'v': 1}

TARGET dict: {'l': 2, 'e': 2, 'v': 1}

True

 

>>> sub_word("shorT", "MuchTooLong")

False

>>> sub_word("supercalifragilisticexpialidocious", "california")

SOURCE dict: {'s': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'c': 3, 'a': 3, 'l':
3, 'i': 7, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1, 'o': 2}

TARGET dict: {'c': 1, 'a': 2, 'l': 1, 'i': 2, 'f': 1, 'o': 1, 'r': 1, 'n':
1}

False

 

>>> sub_word("supercalifragilisticexpialidocious", "fragile")

SOURCE dict: {'s': 3, 'u': 2, 'p': 2, 'e': 2, 'r': 2, 'c': 3, 'a': 3, 'l':
3, 'i': 7, 'f': 1, 'g': 1, 't': 1, 'x': 1, 'd': 1, 'o': 2}

TARGET dict: {'f': 1, 'r': 1, 'a': 1, 'g': 1, 'i': 1, 'l': 1, 'e': 1}

True

 

>>> sub_word("devil", "vile")

SOURCE dict: {'d': 1, 'e': 1, 'v': 1, 'i': 1, 'l': 1}

TARGET dict: {'v': 1, 'i': 1, 'l': 1, 'e': 1}

True

 

 

-----Original Message-----
From: Tutor <tutor-bounces+avigross=verizon.net at python.org> On Behalf Of
Peter Otten
Sent: Saturday, October 20, 2018 3:02 AM
To: tutor at python.org
Subject: Re: [Tutor] How to find optimisations for code

 

Steven D'Aprano wrote:

 

> We don't need to check that the individual letters are the same, 

> because checking the counts will suffice. If they are not the same, 

> one string will have (let's say) two A's while the other will have 

> none, and the counts will be different.

 

Another great optimisation is solving the actual problem ;)

 

The OP isn't looking for anagrams, he wants to know if string2 can be spelt
with the characters in string1:

 

abba, abbbbbacus --> True (don't care about the extra b, c, u, s) abba, baab
--> True (don't care about character order) abba, bab --> False (bab is
missing an a)

 

 

_______________________________________________

Tutor maillist  -   <mailto:Tutor at python.org> Tutor at python.org

To unsubscribe or change subscription options:

 <https://mail.python.org/mailman/listinfo/tutor>
https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list