[issue9771] add an optional "default" argument to tokenize.detect_encoding
report at bugs.python.org
Sat Sep 4 13:32:10 CEST 2010
New submission from Florent Xicluna <florent.xicluna at gmail.com>:
The function tokenize.detect_encoding() detects the encoding either in the coding cookie or in the BOM. If no encoding is found, it returns 'utf-8':
When result is 'utf-8', there's no (easy) way to know if the encoding was really detected in the file, or if it falls back to the default value.
Cases (with utf-8):
- UTF-8 BOM found, returns ('utf-8-sig', )
- cookie on 1st line, returns ('utf-8', [line1])
- cookie on 2nd line, returns ('utf-8', [line1, line2])
- no cookie found, returns ('utf-8', [line1, line2])
The proposal is to allow to call the function with a different default value (None or ''), in order to know if the encoding is really detected.
For example, this function could be used by the Tools/scripts/findnocoding.py script.
components: Library (Lib)
stage: patch review
title: add an optional "default" argument to tokenize.detect_encoding
type: feature request
versions: Python 3.2
Added file: http://bugs.python.org/file18745/detect_encoding_default.diff
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list