Chardet oddity
Albert-Jan Roskam
sjeik_appie at hotmail.com
Wed Oct 23 13:07:14 EDT 2024
Today I used chardet.detect in the repl and it returned windows-1252
(incorrect, because it later resulted in a UnicodeDecodeError). When I ran
chardet as a script (which uses UniversalLineDetector) this returned
MacRoman. Isn't charset.detect the correct way? I've used this method many
times.
# Interpreter
>>> contents = open(FILENAME, "rb").read()
>>> chardet.detect(content)
{'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
''}
# Terminal
$ python -m chardet FILENAME
FILENAME: MacRoman with confidence 0.7167379080370483
Thanks!
Albert-Jan
More information about the Python-list
mailing list