problem's with strings

Alex Martelli aleax at aleax.it
Mon Nov 4 11:01:19 EST 2002


Richard Mertens wrote:

> first problem:
> 
> I'd like to convert a text-file von unix to windows and vice versa
> 
> Do you know a solution for this
> 
> 
> When the python script is reading the text-file i have always the Nextline
> - Character on the end. How do you I remove it?

When you open a file without specifying whether it's text or
binary, Python considers it text by default.  "text" means
that: when reading, the platform-typical line termination
combination (\n on Unix, \r on Mac, \r\n on Windows) gets
changed into a single line-termination character \n -- when
writing, each line-termination character \n gets changed into
the platform-typical like termination combination, as above.

Use 'rb' (for reading) or 'wb' (for writing) as the second
argument to built-in function open if you want to open a file
as binary, i.e., without any translation being effected, on
neither reading nor writing, to any line-termination character
or combination of characters.

Note therefore than:
-- opening with options 'rb' or 'wb' is the only way to get
   identical behavior, for a file seen as a given sequence
   of bytes, independently of what platform you're running on;
-- on Unix, there is no difference in behavior whether you
   open a file as text or binary (because \n is both the
   conventional line-termination character, AND the way in
   which line termination is indicated in Unix text files).

You don't mention the Macintosh, therefore I assume you are
only interested in Unix and Windows platforms.

One way to read a file (no matter whether it's using Unix
or Windows conventions) into a list of strings, one per line,
is for example as follows:

lines = open('thefile','rb').read().replace('\r','').splitlines()

the .replace part removes any carriage returns that might be
in the file, leaving \n as the separator between lines; the
.splitlines then forms a list of strings, one per lines (without
the line-separators; if you want the \n separator to be left
at the end of each string, call .splitlines(1) instead of just
.splitlines() at the end).

This is only suitable if your file is small enough to fit quite
comfortably in memory, of course.


Similarly, given a list of lines, one per string and without
line termination character, to write them out as a Unix-format
text file, whatever platform you're running on:

open('thefile','wb').write('\n'.join(lines))

and to write them as a Windows-format text files instead:

open('thefile','wb').write('\r\n'.join(lines))


Many other approaches can be preferable depending on several
parameters of which you don't inform us, such as -- what
platform[s] is/are your script[s] running on, are the files
too huge to fit in memory comfortably all at once, do you
know when you're reading/writing a textfile in Windows/Unix
format and running on what platform[s] or must you be
prepared for any eventuality, and so on.  I trust that the
information in this post can be of help to you in any case.


Alex





More information about the Python-list mailing list