[Tutor] Messy - Very Messy string manipulation.

Wed Oct 28 12:09:24 EDT 2015

On Wed, Oct 28, 2015 at 10:09 AM, Vusa Moyo <soweto at gmail.com> wrote:
> Hi Guys,
>
> I've written a script to remove vowels from a string/sentence.
>
> the while loop I'm using below is to take care of duplicate vowels found in
> a sentence, ie
>
> anti_vowel('The cow moos louder than the frog')
>
> It works, but obviously its messy and n00by. Any suggestions on how I can
> write this code better?

First off, the code that works (by "works" I mean "passes its tests")
is far better than the code that doesn't, no matter what form that
code takes.  That said, there are several tricks you can use to
significantly shorten your function here.

> def anti_vowel(text):

I'd rather name this "remove_vowels", since that's what it *does*
rather than what it *is* is an English sense.

>     vowel = ['a', 'e', 'i', 'o', 'u']
>     VOWEL = ['A', 'E', 'I', 'O', 'U']

Strings can be thought of as tuples optimized for characters, with
some special methods.  Tuples, in turn, can be thought of as immutable
lists.  So the above can be shortened to:

   vowel = 'aeiou'
   VOWEL = 'AEIOU'

Using one of those special methods I mentioned:

   vowel = 'aeiou'
   VOWEL = vowel.upper()

But do we really need two separate vowel containers?

   vowels = 'aeiou'
   all_vowels = vowels + vowels.upper()

Or to save a name, and some typing:

   vowels = 'aeiou'
   vowels += vowels.upper()

>     manip = []
>
>     for i in text:
>         manip.append(i)

Any time you have the pattern "create an empty list, then append to it
in a for loop", think "comprehension".  The above could be shortened
to:

   manip = [i for i in text]

Or, since the list constructor takes an iterable argument and strings
give characters upon iteration:

   manip = list(text)

But we don't really need manip at all, as I'll show further down.

>     fufu = 0
>     #
>     while fufu < 16:

This is a fairly weird way to spell "loop 8 times", if I'm honest :).
A more idiomatic approach would be to get rid of 'fufu' and do this
instead:

   for each in range(8):

Which loops 8 times and assigns the number of the loop (0-7) to the
name 'each' each time around.  'each' could be any valid identifier
('_' is commonly used for this), 'each' just makes sense reading the
line as English.

>         for x in vowel:
>             if x in manip:
>                 manip.remove(x)
>
>         for y in VOWEL:
>             if y in manip:
>                 manip.remove(y)

Above, we combined 'vowel' and 'VOWEL' into 'vowels', so these can be
shortened into a single loop by removing the second loop and changing
the first to iterate over 'vowels' instead of 'vowel'.  But instead of
removing values from a list we just built, it's easier to build the
list without those values in the first place, see below.

>         fufu = fufu + 2

This line goes away with the change from 'while' to 'for each in range'.

>     strong = ''.join(manip)
>     return strong

I suspect this was meant to be 'string' :).  Anyway, 'return' can
return any expression not just a name, so this could be just:

   return ''.join(manip)

So, what was I talking about with not needing 'manip' and building the
list without the values in the first place?  Combining some other
stuff mentioned above, we can build a list of the characters of the
given text, minus vowels, using a comprehension:

   list_of_chars_without_vowels = [c for c in text if c not in vowels]

To better understand what's going on here, you can "unroll" the
comprehension by initializing the name to an empty list, then moving
the initial expression (the "c" in "c for ...") to an append call all
the way inside:

   list_of_chars_without_vowels = []
   for c in text:
       if c not in vowels:
           list_of_chars_without_vowels.append(c)

'list_of_chars_without_vowels' is then in the same state your 'manip'
was in after the loops, so the loops go away entirely.
'list_of_chars_without_vowels' is an unwieldy name, but we don't
actually need to name it:

   return ''.join([c for c in text if c not in vowels])

And one final optimization (which in this case is more of a finger
optimization than a performance optimization), we can drop the square
brackets to turn the list comprehension into a generator expression:

   return ''.join(c for c in text if c not in vowels)

The difference between the two above statements is that the first
calculates the entire list of characters, then passes it to ''.join(),
which iterates through the list to create the final string, while the
second creates a generator and passes it to ''.join(), and the
generator calculates and yields the next character each time ''.join()
calls next() on it.

To put everything together, here's my final version, with a docstring,
a punny Easter egg, and a very simplistic test that is not adequate
testing:

   import random

   def remove_vowels(text):
       """Remove all vowels from 'text' and return the result."""
       vowels = 'aeiou' + random.choice(['y', ''])  # "and sometimes y"
       vowels += vowels.upper()
       return ''.join(c for c in text if c not in vowels

   assert remove_vowels('Did It work? Looks like.') == 'Dd t wrk? Lks Lke.'

Hoping I may have taught you something,
-- 
Zach