[Tutor] Re: A Demolished Function

Christopher Smith csmith@blakeschool.org
Sun, 21 Apr 2002 16:21:55 -0500


>Danny Yoo wrote:
>>After writing this function, though, I still feel tense and
>apprehensive. 
>>Does anyone see any improvements one could make to make the function 
>>easier to read?  Any criticism or dissension would be great.  Thanks! 
>
Kirby replied:
>Some inconsequential coding changes -- not necessarily easier
>to read:
>
>def conservativeSplit(regex, stuff):
>     """Split 'stuff' along 'regex' seams."""
>     fragments = []
>     while 1:
>         match = regex.search(stuff)
>         if not match: break
>         begin, end = match.span()
>         if not begin == 0:
>             fragments.append(stuff[0 : begin])
>         fragments.append(stuff[begin : end])
>         stuff = stuff[end :]
>     if stuff: fragments.append(stuff)
>     return fragments
>
>Kirby

I was revisting string splitting and the sorting titles that contain
numbers and I realized that the "conservative split" is already a
regex option:  just enclose your pattern in parentheses and the 
match will be retained in the split:

>>> import re
>>> t="five sir four sir three sir two sir one sir"
>>> sir=re.compile(r'(sir)')
>>> sir.split(t)
['five ', 'sir', ' four ', 'sir', ' three ', 'sir', ' two ', 'sir', ' one
', 'sir', '']

Another way to sort titles containing numbers is to split out (and retain)
the numbers, convert the numbers to integers, and then sort the split-up
titles.  This will cause the numbers to be treated as numbers instead of
strings.  Joining the split titles back together again brings you back to
the original title.

Here's a demo using Danny's original data:

def sortTitles(b):
    """The strings in list b are split apart on numbers (integer runs)
    before they are sorted so the strings get sorted according to 
    numerical values when they occur (rather than text values) so
    2 will follow 1, for example, rather than 11."""

    # the () around the pattern preserves the pattern in the split
    integer=re.compile(r'(\d+)') 
    for j in range(len(b)):
        title=integer.split(b[j])
        #
        # Every *other* element will be an integer which was found; 
        # convert it to a int
        #
        for i in range(1,len(title),2):
                title[i]=int(title[i])
        b[j]=title[:]
    #
    # Now sort and rejoin the elements of the title
    #
    b.sort()
    for i in range(len(b)):
        for j in range(1,len(b[i]),2):
                b[i][j]=str(b[i][j])
        b[i]=''.join(b[i])

if __name__ == '__main__':
    book_titles = ['The 10th Annual Python Proceedings',
                   'The 9th Annual Python Proceedings',
                   'The 40th Annual Python Proceedings',
                   '3.1415... A Beautiful Number',
                   '3.14... The Complexity of Infinity',
                   'TAOCP Volume 3: Sorting and Searching',
                   'TAOCP Volume 2: Seminumerical Algorithms',
                   'The Hitchhiker\'s Guide to the Galaxy']

    print "Here are a list of my books, in sorted order:"
    sortTitles(book_titles)
    print book_titles

/c