[Numpy-discussion] ANN: Numpy 1.6.0 beta 2

Christopher Barker Chris.Barker at noaa.gov
Wed Apr 6 11:56:27 EDT 2011


On 4/5/11 10:33 PM, Matthew Brett wrote:
> Did you mean to send this just to me?  It seems like the whole is
> generally interesting and helpful, at least to me...

I did mean to send to the list -- I've done that now.

> Well, the current code doesn't split on \r in py3k, admittedly that
> must be a rare case.

I guess that's a key question here -- It really *should* split on /r, 
but maybe it's rare enough to be unimportant.

> The general point about whether to read binary or text is also in
> play.   I agree with you, reading text sounds like a better idea to
> me, but I don't know the performance hit.   Pauli had it as reading
> binary in the original conversion and was defending that in an earlier
> email...

The thing is -- we're talking about parsing text here, so we really are 
reading text files, NOT binary files.

So the question really is -- do we want py3's file reading code to 
handle encoding issues for us, or do we want to handle them ourselves. 
If we only want to support ansi encodings, then handling ourselves may 
well be easier and faster performing. If we go that way we need to 
handle line-endings, too. The easy way is to only support line endings 
with a '\n' in them -- that works out of the box. But it's not that hard 
to support 'r' also, depending on how you want to do it. Before 'U' 
existed, I did that all the time, something like:

some_text = file.read(some_size_buffer)
some_text.replace('\r\n', '\n')
some_text.replace('\r', '\n')
lines = some_text.split('\n')

(by the way, if you're going to support this, it's really nice to 
support mixed line-endings (like this approach does) -- there are a lot 
of editors that can make a mess of line endings.

If you can read the entire file into memory at once, this is almost 
trivial, if you can't -- there is a bit more bookeeping code to be written.

DARN -- I think I said my last note was the last on this topic!

-Chris






-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list