[Tutor] Python 3.2: processing text files in binary mode, because I want to remove carriage returns and line feeds...

Steven D'Aprano steve at pearwood.info
Thu Aug 23 18:46:23 CEST 2012


On 24/08/12 00:42, Flynn, Stephen (L & P - IT) wrote:
> Python 3.2, as in the subject, although I also have 2.7 on this machine
> too.
>
>
>
> I have some data which contains text separated with field delimiters
> (|~) and a record terminator (||)

[trim well over 50 lines of explanation]

> Is there a method of writing out a binary mode file via print() and
> making use of the end keyword?


Please try to simplify your examples when posting them! You give an
enormous amount of detail which simply isn't relevant to the question
you end up asking. I acknowledge that you did try to make some attempt
to trim the extraneous detail in your post, e.g.:

|~Some Split text over serveral lines|~Aldxxxxe, Mxxxx|~||
123456009999993|~59999999|~2999999|~8253265|~5|~11|~2011-07-11|~15:06:53

but still, your lead up to the question is intimidatingly large.

You may find it useful to read this website:

http://sscce.org/

for some ideas on how to ask short, to the point questions that will
get good responses.


In this case, to answer your question, no, I don't believe that you
can write binary data to a file using print in Python 3, or at least,
if you can, it isn't obvious and you probably shouldn't do so.

print generates strings, and tries to write the string to the file.
But a file in binary mode cannot take strings. It needs binary data.
So if you try, you just get an error:


py> data = b"abc1234"  # Bytes objects b"..." is binary data.
py> f = open("/tmp/rubbish.data", "wb")
py> print(data, file=f)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface


A slightly cryptic error message, but clear enough if I try to
write directly to the open file with a string:

py> f.write("hello")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface


Since print automatically converts its argument into a string, this
cannot work.

If you have binary data, the best way to write it out to a file is
by writing it directly to the file:


py> data = b"abc1234"
py> f = open("/tmp/rubbish.data", "wb")
py> f.write(data)
7



> If there's not, I presume I'll need to remove the \r\n from "line" in my
> else: section and push the amended data out via an out.write(line). How
> does one amend bytes in a "line" object

line = line.rstrip()


will remove any whitespace, including \r and \n, from the right-hand side
of the line.




-- 
Steven


More information about the Tutor mailing list