program to generate data helpful in finding duplicate large files

Chris Kaynor ckaynor at zindagigames.com
Mon Sep 22 21:47:26 CEST 2014


I went and looked up the PEPs regarding universal new-lines, and it seems
it would be platform-independent - all of "\r\n", "\r", and "\n" will
always be converted to "\n" in Python, unless explicitly modified on the
file object (or Universal newlines are disabled).

It still stands that for platform independence, "rb" should be used due to
some platforms treating binary files differently.

Additionally, and most importantly, for what the OP was doing (determining
if files are the same), you almost certainly want to read the raw bytes -
especially if some of the files are truly binary (such as *.exe on Windows).

Chris

On Mon, Sep 22, 2014 at 12:34 PM, <random832 at fastmail.us> wrote:

> On Thu, Sep 18, 2014, at 14:45, Chris Kaynor wrote:
> > Additionally, you may want to specify binary mode by using
> > open(file_path,
> > 'rb') to ensure platform-independence ('r' uses Universal newlines, which
> > means on Windows, Python will convert "\r\n" to "\n" while reading the
> > file). Additionally, some platforms will treat binary files differently.
>
> Does 'r' not use universal newlines on unix? If not, why not? The only
> purpose seems to be to allow people to write bad programs and have them
> work on unix. It makes even less sense in python 3 where opening a file
> in text mode results in a TextIOWrapper with utf-8 encoding, and
> therefore can't be used on arbitrary binary files.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140922/1d3e14f5/attachment.html>


More information about the Python-list mailing list