string substitutions

John Machin sjmachin at lexicon.net
Sat Feb 23 20:10:08 EST 2002


bobnotbob at byu.edu (Bob Roberts) wrote in message news:<c4e6b17d.0202231152.5f87765d at posting.google.com>...
> What would be a good way to replace every one or more spaces (" ") in
> a string with just one space?  Or replace any number of newlines with
> just one?

Same answer as in most other languages: What is a "good" way depends
on where you want to be on the speed curve -- fast development or fast
running?. Here are some points on the curve:

1. One way is to use the regular expression substitution facility  --
replace all occurrences of "  +" (*two* or more spaces) with " "
(single space). Same for newlines. Find re.sub() in the Python
manuals.

Doing *one* or more is semantically equivalent but will be slower
because of useless substitutions --- effect would be much worse if you
are using a version of Python before 2.2, as the substitution methods
were coded in Python (now in C).

2. If what you want is to replace any contiguous chunk of whitespace
(including newlines, see example below) with a single space (and trim
off *any* leading and trailing whitespace), you could use the
split-join hack:

>>> " ".join(" b    b   \n\n    x ".split())
'b b x'

3. The collapse() function in the mxTextTools package *may* be what
you want. You'd have to find it (google), RTFM, download it, install
it, ...

4. In extremis, read the Extending/Embedding manual and reach for your
C compiler -- not such a silly idea if you are habitually reading
million-line files which need their non-newline whitespace normalised
in a fashion similar to (2) above.

Once you are over the learning curve/hump and have a C extension
module with useful fast gadgets in it, adding another is minimal
bother.



More information about the Python-list mailing list