[New-bugs-announce] [issue39206] Modulefinder does not consider source file encoding

Fri Jan 3 16:33:55 EST 2020

New submission from Nicholas Feix <snip3rnick at googlemail.com>:

The modulefinder._find_module(...) function returns file objects text mode for source modules using the system encoding.
ModuleFinder.load_module(...) can run into decoding issues when the source file encoding does not match the system default.

The prior implementation imp.find_module(...) detected the encoding correctly using the tokenize.detect_encoding(...) function.
With the following code segment the detection would work again with UTF-8 BOM and PEP 263 type cookies.

    encoding = None
    if 'b' not in mode:
        with open(file_path, 'rb') as file:
            encoding = tokenize.detect_encoding(file.readline)[0]

    file = open(file_path, mode, encoding=encoding)

----------
components: Library (Lib)
messages: 359259
nosy: Nicholas Feix
priority: normal
severity: normal
status: open
title: Modulefinder does not consider source file encoding
type: behavior
versions: Python 3.8, Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue39206>
_______________________________________