[Numpy-discussion] ANN: Numpy 1.6.0 beta 2

Matthew Brett matthew.brett at gmail.com
Tue Apr 5 17:46:18 EDT 2011


Hi,

On Tue, Apr 5, 2011 at 10:56 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Apr 5, 2011 at 11:45 AM, <josef.pktd at gmail.com> wrote:
>>
>> On Tue, Apr 5, 2011 at 1:20 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Tue, Apr 5, 2011 at 10:46 AM, Christopher Barker
>> > <Chris.Barker at noaa.gov>
>> > wrote:
>> >>
>> >> On 4/4/11 10:35 PM, Charles R Harris wrote:
>> >> >     IIUC, "Ub" is undefined -- "U" means universal newlines, which
>> >> > makes
>> >> > no
>> >> >     sense when used with "b" for binary. I looked at the code a ways
>> >> > back,
>> >> >     and I can't remember the resolution order, but there isn't any
>> >> > checking
>> >> >     for incompatible flags.
>> >> >
>> >> >     I'd expect that genfromtxt, being txt, and line oriented, should
>> >> > use
>> >> >     'rU'. but if it wants the raw line endings (why would it?) then
>> >> > rb
>> >> >     should be fine.
>> >> >
>> >> >
>> >> > "U" has been kept around for backwards compatibility, the python
>> >> > documentation recommends that it not be used for new code.
>> >>
>> >> That is for  3.*  -- the 2.7.* docs say:
>> >>
>> >> """
>> >> In addition to the standard fopen() values mode may be 'U' or 'rU'.
>> >> Python is usually built with universal newline support; supplying 'U'
>> >> opens the file as a text file, but lines may be terminated by any of
>> >> the
>> >> following: the Unix end-of-line convention '\n', the Macintosh
>> >> convention '\r', or the Windows convention '\r\n'. All of these
>> >> external
>> >> representations are seen as '\n' by the Python program. If Python is
>> >> built without universal newline support a mode with 'U' is the same as
>> >> normal text mode. Note that file objects so opened also have an
>> >> attribute called newlines which has a value of None (if no newlines
>> >> have
>> >> yet been seen), '\n', '\r', '\r\n', or a tuple containing all the
>> >> newline types seen.
>> >>
>> >> Python enforces that the mode, after stripping 'U', begins with 'r',
>> >> 'w'
>> >> or 'a'.
>> >> ""
>> >>
>> >> which does, in fact indicate that 'Ub' is NOT allowed. We should be
>> >> using 'Ur', I think. Maybe the "python enforces" is what we saw the
>> >> error from -- it didn't used to enforce anything.
>> >>
>> >
>> > 'rbU' works and I put that in as a quick fix.
>> >>
>> >> On 4/5/11 7:12 AM, Charles R Harris wrote:
>> >>
>> >> > The 'Ub' mode doesn't work for '\r' on python 3. This may be a bug in
>> >> > python, as it works just fine on python 2.7.
>> >>
>> >> "Ub" never made any sense anywhere -- "U" means universal newline text
>> >> file. "b" means binary -- combining them makes no sense. On older
>> >> pythons, the behaviour of 'Ub' was undefined -- now, it looks like it
>> >> is
>> >> supposed to raise an error.
>> >>
>> >> does 'Ur' work with \r line endings on Python 3?
>> >
>> > Yes.
>> >
>> >>
>> >> According to my read of the docs, 'U' does nothing -- "universal"
>> >> newline support is supposed to be the default:
>> >>
>> >> """
>> >> On input, if newline is None, universal newlines mode is enabled. Lines
>> >> in the input can end in '\n', '\r', or '\r\n', and these are translated
>> >> into '\n' before being returned to the caller.
>> >> """
>> >>
>> >> > It may indeed be desirable
>> >> > to read the files as text, but that would require more work on both
>> >> > loadtxt and genfromtxt.
>> >>
>> >> Why can't we just open the file with mode 'Ur'? text is text, messing
>> >> with line endings shouldn't hurt anything, and it might help.
>> >>
>> >
>> > Well, text in the files then gets the numpy 'U' type instead of 'S', and
>> > there are places where byte streams are assumed for stripping and such.
>> > Which is to say that changing to text mode requires some work. Another
>> > possibility is to use a generator:
>> >
>> > def usetext(fname):
>> >     f = open(fname, 'rt')
>> >     for l in f:
>> >        yield asbytes(f.next())
>> >
>> > I think genfromtxt could use a refactoring and cleanup, but probably not
>> > for
>> > 1.6.
>>
>> I think it should also be possible to read "rb" and strip any \r, \r\n
>> in _iotools.py,
>> that's were the bytes are used, from my reading and the initial error
>> message.
>>
>
> Doesn't work for \r, you get the whole file at once instead of line by line.

Thanks for trying to sort out this ugliness.  I've added another pull request:

https://github.com/numpy/numpy/pull/71

 - tests for \n \r\n and \r files, raising skiptest for currently
failing 3.2 \r mode.

Matthew



More information about the NumPy-Discussion mailing list