[Tutor] string case manipulation in python 3.x

Steven D'Aprano steve at pearwood.info
Sun Nov 28 14:55:32 CET 2010


Rance Hall wrote:
> I need to do some case manipulation that I don't see in the documented
> string functions.
> 
> I want to make sure that user input meets a certain capitalization
> scheme, for example, if user input is a name, then the first letter of
> each word in the name is upper case, and the rest are lower.
> 
> I know how to force the upper and lower cases with string.lower() and friends.

Then you should know about str.capitalize and str.title:

 >>> s = "joHN clEeSE"
 >>> s.capitalize()
'John cleese'
 >>> s.title()
'John Cleese'

It looks like you want str.title, with the exception that numeric 
suffixes will need to be handled specially:

 >>> "fred smith III".title()
'Fred Smith Iii'

But see below.


> and I could even do a string.split() on the spaces in the names to
> break the name into pieces.
> 
> I don't see an obvious way to do this.
> 
> what I've come up with so far is to do something like this.
> 
> break the name into pieces
> force each piece to be lower case
> replace the first letter in each word with a uppercase version of
> whats there already.
> 
> Problems with this approach as I see them:
> The built in split function will create undesirable results for names
> that contain suffixes like Jr.  etc.

Why? I don't see how it would, unless people leave out the space between 
their name and suffix. But then, if they leave out the space between 
their first name and surname, you have the same problem.


> I'm not entirely sure how to replace the string with an uppercase
> first letter on a per word basis.

The right way is with the capitalize or title methods, but the generic 
way is illustrated by this similar function:

def reverse_capitalize(s):
     # "abcd" -> "aBCD"
     if s:
         return s[0].lower() + s[1:].upper()
     else:  # empty string
         return s


Which brings back to the issue of Roman number suffixes. You can deal 
with them like this:

def capitalize(s):
     s = s.title()
     words = s.split()
     suffixes = 'ii iii iv v vi vii viii ix x xi xii'.split()
     # Twelve generations should be enough. If not, add more.
     if words[-1].lower() in suffixes:
         words[-1] = words[-1].upper()
         return " ".join(words)
     return s


> Whats the "right" way to do something like this?

Another approach would be the old-fashioned way: change the string 
character by character.

# change in place
letters = list(name)
for i, c in letters:
     letters[i] = do_something_with(c)
name = ''.join(letters)



-- 
Steven


More information about the Tutor mailing list