[New-bugs-announce] [issue10335] tokenize.open_python(): open a Python file with the right encoding

Sat Nov 6 11:49:39 CET 2010

New submission from STINNER Victor <victor.stinner at haypocalc.com>:

In Python3, the following pattern becomes common:

        with open(fullname, 'rb') as fp:
            coding, line = tokenize.detect_encoding(fp.readline)
        with open(fullname, 'r', encoding=coding) as fp:
            ...

It opens the file is opened twice, whereas it is unnecessary: it's possible to reuse the raw buffer to create a text file. And I don't like the detect_encoding() API: pass the readline function is not intuitive.

I propose to create tokenize.open_python() function with a very simple API: just one argument, the filename. This function calls detect_encoding() and only open the file once.

Attached python adds the function with an unit test and a patch on the documentation. It patchs also functions currently using detect_encoding().

open_python() only supports read mode. I suppose that it is enough.

----------
components: Library (Lib), Unicode
files: open_python.patch
keywords: patch
messages: 120600
nosy: haypo
priority: normal
severity: normal
status: open
title: tokenize.open_python(): open a Python file with the right encoding
versions: Python 3.2
Added file: http://bugs.python.org/file19518/open_python.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10335>
_______________________________________