How to read gzipped utf8 file in Python?

"Martin v. Löwis" martin at
Thu Nov 22 21:25:20 CET 2007

>   I have a large (gigabytes) file which is encoded in UTF-8 and then
> compressed with gzip.  I'd like to read it with the "gzip" module
> and "utf8" decoding.

You didn't specify the processing you want to perform. For example,
this should work just fine

fd =, 'rb')
for line in fd.readline():

For that processing, it is not even necessary to know what the encoding
of the file is, except that it is an ASCII superset (which UTF-8 is).

> The obvious approach is
>     fd =, 'rb',encoding='utf8')
> But "" doesn't support an "encoding" parameter.  (It
> probably should, for consistency.)

I think I disagree. The builtin open function does not support an
encoding argument, either (in Python 2.x). Conceptually, gzip operates
on byte streams, not character streams.

> Is it possible to express "unzip, then decode utf8" via
> ""?

If that's the processing you want to do - sure

fd0 =, 'rb')
fd = codecs.getreader("utf-8")(fd0)
data = fd.readline()

You can combine that to

fd = codecs.getreader("utf-8")(


More information about the Python-list mailing list