[Tutor] removing line ends from Word text files

Michael Janssen Janssen at rz.uni-frankfurt.de
Sat Jul 17 15:55:50 CEST 2004


On Fri, 9 Jul 2004, Christian Meesters wrote:

> Right now I have the problem that I want to remove the MS Word line end
> token from text files: When saving a text file as 'text only' line ends
> are displayed as '^M' in a shell (SGI IRIX (tcsh) and Mac (tcsh or
> bash)). I want to get rid of these elements for further processing of
> the file and have no idea how to access them in a Python script. Any
> idea how to replace the '^M' against a simple '\n'? (I already tried
> '\r\n' and various other combinations of characters, but apparently all
> aren't '^M'.) '^M' is one character.

You can allways ask Python when you want to know how it will represent
this character: Read one line with "readline" and print its repr-string:

fo = open("filename")
line = fo.readline()
print repr(line)

repr gives you an alternative string representation of any objects. repr
used on strings doesn't interpret backslash sequences like \n or \r. As
you are on MAC, I would guess your newline character is a simple "\r".

you can also ask Python for the caracter's ordinal
print ord(line[-2]) # just in case one newline consists of two chars
print ord(line[-1])

It's probably best to do such investigations with an interactive Python
session. But now since I've realized that readline is Unix-only, I don't
think interactive mode is that much fun on MAC/Win: without readline you
can't repeat your commands (without having to type them again and again).
You can't use the cursor keys. Perhaps Idle offers elaborate line editing
even on those systems.


regards

Michael



More information about the Tutor mailing list