Removing ^M
Nicola Larosa
nico at tekNico.net
Sat Jun 8 16:55:55 EDT 2002
> I am trying remove ^M characters (some kind of newline character) from an
> HTML file. I've tried all sorts of string.replace and sed possibilities
> but the things just won't go away. Does anyone have a way of removing such
> characters?
Python source distribution includes, at least since 1.5.2 , the files
crlf.py and lfcr.py in Tools/scripts/ . Here is crlf.py :
#!/usr/bin/env python
"Replace CRLF with LF in argument files. Print names of changed files."
import sys, re, os
for file in sys.argv[1:]:
if os.path.isdir(file):
print file, "Directory!"
continue
data = open(file, "rb").read()
if '\0' in data:
print file, "Binary!"
continue
newdata = re.sub("\r\n", "\n", data)
if newdata != data:
print file
f = open(file, "wb")
f.write(newdata)
f.close()
The only difference in lfcr.py are the description and the re.sub line; they
become:
"Replace LF with CRLF in argument files. Print names of changed files."
...
newdata = re.sub("\r?\n", "\r\n", data)
There are lots of interesting nuggets in the Demo/ and Tools/ directories of
the Python source distribution.
--
"Too much cleverness in the parser can turn against you."
Guido Van Rossum
Nicola Larosa - nico at tekNico.net
More information about the Python-list
mailing list