[Tutor] sentence case module for comments and possible cookbook submission

Brian van den Broek bvande at po-box.mcgill.ca
Fri Sep 10 11:05:25 CEST 2004


Hi all,

earlier today, I needed to change some ALL CAPS text to sentence case. To 
my surprise, searching the docs, the Cookbook, and a browse of the 70+ 
google hits for: Python "sentence case" turned up nothing.

I've produced something that I think works. But, as regular readers of the 
list might know, I'm still learning. I'd appreciate any comments on it. I 
realize it is a bit long, but I intend to refine and submit to the 
cookbook. (Unless the likely "But what about using X instead?" comments 
are forthcoming. :-) If for some reason you comment but don't want to be 
mentioned should I post to the cookbook, please let me know.

Also, I still can't believe I'm not reinventing the wheel. If there is 
something available, I couldn't find it. So, if you know, I'd be happy to 
hear.

Thanks and best to all,

Brian vdB


#! /usr/bin/env python
# sentence_caser.py
# Version 0.1
# Brian van den Broek
# bvande at po-box.mcgill.ca
# This module is released under the Python License. (See www.python.org.)

punctuation_indexes = {}
punctuation = ['!', '?']

def punctuation_stripper(data_string):
     '''punctuation_stripper(data_string) -> data_string

     Stores the indexes of each type of punctuation (other than '.') in the
     punctuation_indexes dict and replaces them with '.'s. (This makes 
splitting
     the string easier, and, thanks to the dict, is reversible.)'''

     for mark in punctuation:
         punctuation_indexes[mark] = []
         offset = 0
         while True:
             try:
                 i = data_string.index(mark, offset)
                 punctuation_indexes[mark].append(i)
                 offset = i + 1
             except ValueError:
                 break
         data_string = data_string.replace(mark, '.')
     return data_string

def change_to_sentence_case(sentence_list):
     '''change_to_sentence_case(sentence_list) -> cap_sentences_list

     Takes a list of sentence strings and transforms it so that the first and
     only the first) letter is capitalized. It is a bit more complicated than
     just calling the capitalize string method as the strings in the sentence
     list may well start with ' ', '(', '[', etc. The while loop travels the
     string, looking for the first letter and calling capitalize on the
     substring it commences. restore_Is() is also called, in an attempt to 
undo
     lower-casing of the pronoun "I".'''

     cap_sentences_list = []
     for s in sentence_list:
         offset = 0
         while offset < len(s):
             if s[offset].isalpha():
                 s = s[:offset] + s[offset:].capitalize()
                 break
             offset += 1
         s += '.'
         s = restore_Is(s)
         cap_sentences_list.append(s)
     return cap_sentences_list

def restore_Is(sentence):
     '''restore_Is(sentence) -> sentence

     Takes a sentence string and tries to restore any "I"s incorrectly 
changed
     to "i"s by change_to_sentence_case()'s use of .capitalize().'''

     sentence = sentence.replace(' i ', ' I ')
     sentence = sentence.replace(' i,', ' I,')
     sentence = sentence.replace(' i.', ' I.')
     return sentence

def restore_punctuation(data_sentences):
     '''restore_punctuation(data_sentences) -> data_sentences

     Consulting the punctuation_indexes dict, restore_punctuation() reverts
     non '.' punctuation that was changed to '.' to facilitate splitting the
     string.'''
     for mark in punctuation:
         for i in punctuation_indexes[mark]:
             data_sentences = data_sentences[:i] + mark + data_sentences[i 
+ 1:]
     return data_sentences

def sentence_caser(data_string):
     '''sentence_caser(data_string) -> data_string

     Takes a string and returns it into sentence case (it is hoped). To do 
so,
     it runs it through various helper functions. sentence_caser() does 
almost
     no work on its own; consult the functions punctuation_stripper(),
     change_to_sentence_case(), and restore_punctuation() for details of the
     processing.'''

     working_data = punctuation_stripper(data_string)
     data_sentences_list = working_data.split('.')
     data_sentences_list = change_to_sentence_case(data_sentences_list)
     data_sentences = ''.join(data_sentences_list)
     data_sentences = restore_punctuation(data_sentences)

     data_sentences = data_sentences[:len(data_string)]
     # To remove possibly spurious trailing '.' added when original string 
ended
     # with non-'.' character (as in data below).

     return data_sentences

if __name__ == '__main__':
     data = '''STRINGS IN ALL CAPS ARE HARD TO READ! SOME PEOPLE THINK 
THEY ARE
LIKE SHOUTING. DO YOU THINK SO? I ONLY WRITE THEM WHEN I HAVE A CAPS-LOCK
ACCIDENT. (OR WHEN CREATING TEST DATA.) THEY ARE NO FUN. (OK, ENOUGH NOW.)'''
     print data
     print
     print sentence_caser(data)




More information about the Tutor mailing list