[Numpy-discussion] ANN: Numpy 1.6.0 beta 2

Tue Apr 5 13:45:34 EDT 2011

On Tue, Apr 5, 2011 at 1:20 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Apr 5, 2011 at 10:46 AM, Christopher Barker <Chris.Barker at noaa.gov>
> wrote:
>>
>> On 4/4/11 10:35 PM, Charles R Harris wrote:
>> >     IIUC, "Ub" is undefined -- "U" means universal newlines, which makes
>> > no
>> >     sense when used with "b" for binary. I looked at the code a ways
>> > back,
>> >     and I can't remember the resolution order, but there isn't any
>> > checking
>> >     for incompatible flags.
>> >
>> >     I'd expect that genfromtxt, being txt, and line oriented, should use
>> >     'rU'. but if it wants the raw line endings (why would it?) then rb
>> >     should be fine.
>> >
>> >
>> > "U" has been kept around for backwards compatibility, the python
>> > documentation recommends that it not be used for new code.
>>
>> That is for  3.*  -- the 2.7.* docs say:
>>
>> """
>> In addition to the standard fopen() values mode may be 'U' or 'rU'.
>> Python is usually built with universal newline support; supplying 'U'
>> opens the file as a text file, but lines may be terminated by any of the
>> following: the Unix end-of-line convention '\n', the Macintosh
>> convention '\r', or the Windows convention '\r\n'. All of these external
>> representations are seen as '\n' by the Python program. If Python is
>> built without universal newline support a mode with 'U' is the same as
>> normal text mode. Note that file objects so opened also have an
>> attribute called newlines which has a value of None (if no newlines have
>> yet been seen), '\n', '\r', '\r\n', or a tuple containing all the
>> newline types seen.
>>
>> Python enforces that the mode, after stripping 'U', begins with 'r', 'w'
>> or 'a'.
>> ""
>>
>> which does, in fact indicate that 'Ub' is NOT allowed. We should be
>> using 'Ur', I think. Maybe the "python enforces" is what we saw the
>> error from -- it didn't used to enforce anything.
>>
>
> 'rbU' works and I put that in as a quick fix.
>>
>> On 4/5/11 7:12 AM, Charles R Harris wrote:
>>
>> > The 'Ub' mode doesn't work for '\r' on python 3. This may be a bug in
>> > python, as it works just fine on python 2.7.
>>
>> "Ub" never made any sense anywhere -- "U" means universal newline text
>> file. "b" means binary -- combining them makes no sense. On older
>> pythons, the behaviour of 'Ub' was undefined -- now, it looks like it is
>> supposed to raise an error.
>>
>> does 'Ur' work with \r line endings on Python 3?
>
> Yes.
>
>>
>> According to my read of the docs, 'U' does nothing -- "universal"
>> newline support is supposed to be the default:
>>
>> """
>> On input, if newline is None, universal newlines mode is enabled. Lines
>> in the input can end in '\n', '\r', or '\r\n', and these are translated
>> into '\n' before being returned to the caller.
>> """
>>
>> > It may indeed be desirable
>> > to read the files as text, but that would require more work on both
>> > loadtxt and genfromtxt.
>>
>> Why can't we just open the file with mode 'Ur'? text is text, messing
>> with line endings shouldn't hurt anything, and it might help.
>>
>
> Well, text in the files then gets the numpy 'U' type instead of 'S', and
> there are places where byte streams are assumed for stripping and such.
> Which is to say that changing to text mode requires some work. Another
> possibility is to use a generator:
>
> def usetext(fname):
>     f = open(fname, 'rt')
>     for l in f:
>        yield asbytes(f.next())
>
> I think genfromtxt could use a refactoring and cleanup, but probably not for
> 1.6.

I think it should also be possible to read "rb" and strip any \r, \r\n
in _iotools.py,
that's were the bytes are used, from my reading and the initial error message.

Josef

>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>