[Tutor] Pythonic way to normalize vertical whitespace

Kent Johnson kent37 at tds.net
Sat May 9 03:54:48 CEST 2009


On Fri, May 8, 2009 at 1:03 PM,  <python at bdurham.com> wrote:
> Note: Following cross-posted to python-list where it got queued due to
> suspicious subject line.
>
> I'm looking for suggestions on technique (not necessarily code) about the
> most pythonic way to normalize vertical whitespace in blocks of text so that
> there is never more than 1 blank line between paragraphs. Our source text
> has newlines normalized to single newlines (\n vs. combinations of \r and
> \n), but there may be leading and trailing whitespace around each newline.
>
> Approaches:
>
> 1. split text to list of lines that get stripped then:
>
> a. walk this list building a new list of lines that track and ignore extra
> blank lines
>
> -OR-
>
> b. re-join lines and replace '\n\n\n' wth' \n\n' until no more '\n\n\n'
> matches exist
>
> 2. use regular expressions to match and replace whitespace pattern of 3 or
> more adjacent \n's with surrounding whitespace
>
> 3. a 3rd party text processing library designed for efficiently cleaning up
> text

1 sounds simple enough. That is the first thing I thought of. 2 should
also be easy if you have any regex-fu and might turn out to be faster
if you have a lot of text.

Kent


More information about the Tutor mailing list