compressing consecutive spaces

attn.steven.kuo at attn.steven.kuo at
Mon Jul 9 20:09:06 CEST 2007

On Jul 9, 7:38 am, Beliavsky <beliav... at> wrote:
> How can I replace multiple consecutive spaces in a file with a single
> character (usually a space, but maybe a comma if converting to a CSV
> file)? Ideally, the Python program would not compress consecutive
> spaces inside single or double quotes. An inelegant method is to
> repeatedly replace two consecutive spaces with one.

One can try mx.TextTools.  E.g.,

from mx.TextTools import *
import re


def advance_position(text, position, len_text, sre):
    mobj = sre.match(text[position:])
    if mobj:
        incr = len(
        incr = 0
    return position + incr

table = ('try_again',
        ('quoted_string', CallArg,
            (advance_position, string_inside_quotes), +1,
        ('nonspace', AllNotIn, ' ', +1, 'try_again'),
        ('space', AllIn, ' ', +1, 'try_again'),
        (None, EOF, Here, +1, MatchOk),
        (None, Fail, Here),)

for target_string in (
"     Try    using mx.TextTools 'for parsing    strings'",
"'It might    be' just what you needed",
'I find   "it    worthwhile"',
    print "BEFORE:%s" % target_string
    _, taglist, _ = tag(target_string, table)
    if taglist:
        tokens = []
        for t in taglist:
            tagobj, left_index, right_index = t[0:3]
            if tagobj == 'space':
                tokens.append(' ')
        print "AFTER:%s" % ''.join(tokens)
        print "Something went horribly wrong"

Hope this helps,

More information about the Python-list mailing list