Removing ^M

Nicola Larosa nico at tekNico.net
Sat Jun 8 16:55:55 EDT 2002


> I am trying remove ^M characters (some kind of newline character) from an
> HTML file. I've tried all sorts of string.replace and sed possibilities
> but the things just won't go away. Does anyone have a way of removing such
> characters?

Python source distribution includes, at least since 1.5.2 , the files 
crlf.py and lfcr.py in Tools/scripts/ . Here is crlf.py :


#!/usr/bin/env python

"Replace CRLF with LF in argument files.  Print names of changed files."

import sys, re, os
for file in sys.argv[1:]:
     if os.path.isdir(file):
         print file, "Directory!"
         continue
     data = open(file, "rb").read()
     if '\0' in data:
         print file, "Binary!"
         continue
     newdata = re.sub("\r\n", "\n", data)
     if newdata != data:
         print file
         f = open(file, "wb")
         f.write(newdata)
         f.close()


The only difference in lfcr.py are the description and the re.sub line; they 
become:

"Replace LF with CRLF in argument files.  Print names of changed files."
...
     newdata = re.sub("\r?\n", "\r\n", data)


There are lots of interesting nuggets in the Demo/ and Tools/ directories of 
the Python source distribution.


-- 
"Too much cleverness in the parser can turn against you."
   Guido Van Rossum

Nicola Larosa - nico at tekNico.net







More information about the Python-list mailing list