[Python-Dev] Universal newlines, and the gzip module.

Christopher Barker Chris.Barker at noaa.gov
Thu Jan 29 21:39:31 CET 2009


Hi all,

Over on the matplotlib mailing list, we ran into a problem with trying 
to use Universal newlines with gzip. In virtually all of my code that 
reads text files, I use the 'U' flag to open files, it really helps not 
having to deal with newline issues. Yes, they are fewer now that the 
Macintosh uses \n, but they can still be a pain.

Anyway, we added such support to some matplotlib methods, and found that 
gzip file reading broken We were passing the flags though into either 
file() or gzip.open(), and passing 'U' into gzip.open() turns out to be 
fatal.

1) It would be nice if the gzip module (and the zip lib module) 
supported Universal newlines -- you could read a compressed text file 
with "wrong" newlines, and have them handled properly. However, that may 
be hard to do, so at least:

2) Passing a 'U' flag in to gzip.open shouldn't break it.

I took a look at the Python SVN (2.5.4 and 2.6.1) for the gzip lib. I 
see this:


         # guarantee the file is opened in binary mode on platforms
         # that care about that sort of thing
         if mode and 'b' not in mode:
             mode += 'b'
         if fileobj is None:
             fileobj = self.myfileobj = __builtin__.open(filename, mode 
or 'rb')

this is going to break for 'U' == you'll get 'rUb'. I tested 
file(filename, 'rUb'), and it looks like it does universal newline 
translation.

So:

* Either gzip should be a bit smarter, and remove the 'U' flag (that's 
what we did in the MPL code), or force 'rb' or 'wb'.

* Or: file opening should be a bit smarter -- what does 'rUb' mean? a 
file can't be both Binary and Universal Text. Should it raise an 
exception? Somehow I think it would be better to ignore the 'U', but 
maybe that's only because of the issue I happen to be looking at now.


That later seems a better idea -- this issue could certainly come up in 
other places than the gzip module, but maybe it would break a bunch of 
code -- who knows?

I haven't touched py3 yet, so I have not idea if this issue is different 
there.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


More information about the Python-Dev mailing list