doctest.testfile fails on text files with Windows line endings

Sun Apr 11 00:01:41 EDT 2010

On Apr 10, 10:16 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> After converting a text file containing doctests to use Windows line
> endings, I'm getting spurious errors:
>
> ValueError: line 19 of the docstring for examples.txt has inconsistent
> leading whitespace: '\r'
>
> I don't believe that doctest.testfile is documented as requiring Unix
> line endings, and the line endings in the file are okay. I've checked in
> a hex editor, and they are valid \r\n line endings.
>
> In doctest._load_testfile, I find this comment and code:
>
>     # get_data() opens files as 'rb', so one must do the equivalent
>     # conversion as universal newlines would do.
>     return file_contents.replace(os.linesep, '\n'), filename
>
> which I read as an attempt to normalise line endings in the file to \n.
>
> (But surely this will fail? If you're running, say, Linux or MacOS,
> linesep will already be '\n' not '\r\n', and consequently the replace
> does nothing, any Windows line endings aren't normalised, and doctest
> will choke on the \r characters. It's only useful if running on Windows.)
>
> But the above only occurs when using a package loader. Otherwise,
> _load_testfile executes:
>
>     return open(filename).read(), filename
>
> which doesn't do any line ending normalisation at all.
>
> To my mind, this is a bug in doctest. Does anyone disagree? I think the
> simplest fix is to change it to:
>
>     return open(filename, 'rU').read(), filename
>
> Comments?
>
> --
> Steven

Seems like a bug to me.  I often assume that I don't know where a
string is coming from, so one of the first steps I usually take when
parsing a string is:

s = s.replace('\r\n', '\n').replace('\r', '\n')

And, out of long-standing pre-Python habit, I always open files in
binary mode and then have my way with them.  I know universal mode is
available, but honestly, I don't care for all the bookkeeping on what
kinds of line endings have been seen -- I just want to normalize the
data.

Regards,
Pat