compressing consecutive spaces
attn.steven.kuo at gmail.com
attn.steven.kuo at gmail.com
Mon Jul 9 14:09:06 EDT 2007
On Jul 9, 7:38 am, Beliavsky <beliav... at aol.com> wrote:
> How can I replace multiple consecutive spaces in a file with a single
> character (usually a space, but maybe a comma if converting to a CSV
> file)? Ideally, the Python program would not compress consecutive
> spaces inside single or double quotes. An inelegant method is to
> repeatedly replace two consecutive spaces with one.
One can try mx.TextTools. E.g.,
from mx.TextTools import *
import re
string_inside_quotes=re.compile(r'(?P<quote>["\']).*?(?<!\\)(?
P=quote)',
re.MULTILINE)
def advance_position(text, position, len_text, sre):
mobj = sre.match(text[position:])
if mobj:
incr = len(mobj.group(0))
else:
incr = 0
return position + incr
table = ('try_again',
('quoted_string', CallArg,
(advance_position, string_inside_quotes), +1,
'try_again'),
('nonspace', AllNotIn, ' ', +1, 'try_again'),
('space', AllIn, ' ', +1, 'try_again'),
(None, EOF, Here, +1, MatchOk),
(None, Fail, Here),)
for target_string in (
" Try using mx.TextTools 'for parsing strings'",
"'It might be' just what you needed",
'I find "it worthwhile"',
):
print "BEFORE:%s" % target_string
_, taglist, _ = tag(target_string, table)
if taglist:
tokens = []
for t in taglist:
tagobj, left_index, right_index = t[0:3]
if tagobj == 'space':
tokens.append(' ')
else:
tokens.append(target_string[left_index:right_index])
print "AFTER:%s" % ''.join(tokens)
else:
print "Something went horribly wrong"
--
Hope this helps,
Steven
More information about the Python-list
mailing list