[New-bugs-announce] [issue4377] tokenize.detect_encoding() and Mac newline
STINNER Victor
report at bugs.python.org
Fri Nov 21 13:32:19 CET 2008
New submission from STINNER Victor <victor.stinner at haypocalc.com>:
I'm trying to fix IDLE to support Unicode (#4008 and #4323). Instead
of IDLE builtin charset detection, I tried to use
tokenize.detect_encoding() but this function doesn't work with script
using Mac new line (b"\r").
Code to detect the encoding of a Python script:
----
def pythonEncoding(filename):
with open(filename, 'rb') as fp:
encoding, lines = detect_encoding(fp.readline)
return encoding
----
Example to reproduce the problem with Mac script:
----
fp = BytesIO(b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re
amie")\r')
encoding, lines = detect_encoding(fp.readline)
print(encoding, lines)
----
=> Result: utf-8 [b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re
amie")\r']
The problem occurs at "line_string = line.decode('ascii')".
Since "line" contains a non-ASCII character (b"\xe8"), the conversion
fails.
----------
components: Library (Lib)
messages: 76176
nosy: haypo
severity: normal
status: open
title: tokenize.detect_encoding() and Mac newline
versions: Python 3.0
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4377>
_______________________________________
More information about the New-bugs-announce
mailing list