To re or not to re ... ( word wrap function?)

Chris Barker chrishbarker at
Sat Sep 22 00:23:54 CEST 2001

Hi all,

I was just asked if it would be hard to write a script that would take
the output from an MS word: "save as text" operation, and re-format it
so it was wrapped to 80 character lines. I said it would be easy, then I
thought about it and realised it was not quite as trivial as I thought.
First I came up with a function that used a few string methods, but was
mostly "by hand". Then I tried an re version. It turned out to be not
much easier or shorter, though probably a little faster. I have not
benchmarked it, and frankly, speed is of little concern here: I'm after

Anyway, I figured that:
A) someone else must have done this already

B) there should be a cleaner and more elegant way to do this.

C) I probably missed some special cases and havn't gotten it wuite right
anyway. (I have a littel more faith in the RE version, I did that
second, and thought of a few more special cases to handle.

Anyone have any suggestions?

note: this function just wraps a single line (or "paragraph" in
Word-speak), I would be part of a script that would do a whole file.

Here is the non-re version:

import string

def WordWrap(text,maxchar = 80):

    A function that formats a single long line into lines that are a
    max of maxchar long.


    if len(text) <= maxchar:
        return text

    new_text = []
    begin = 0
    end = maxchar+1 # allow an extra character, because if it's a space,
it will be removed.
    while end <= len(text):
        # first remove there is whitespace at the beginning:
        if text[end] in string.whitespace:
            begin = end + 1
            end += maxchar+1
        elif end == begin:# no whitespace at all
            begin += maxchar
            end += maxchar
            end -= 1
    return "\n".join(new_text)

# Here is the re version

def WordWrap2(text,maxchar = 80):

    A function that formats a single long line into lines that are a
    max of maxchar long.


    import re
    pattern = r"\s*(\S.{0,"+ `int(maxchar)`+r"})\s+"
    p = re.compile(pattern)

    new_text = []
    start = 0
    while start <=  len(text):
        match = p.match(text[start:])
        if match:
            #print match.groups()[0]
            if match.groups()[0]: # don't append if it's nothing but
            start += match.end()
        else: #"There is no whitespace in maxchar characters"
            start += maxchar

    return "\n".join(new_text)


Christopher Barker,
ChrisHBarker at                 ---           ---           --- ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------

More information about the Python-list mailing list