Chardet oddity

Albert-Jan Roskam sjeik_appie at hotmail.com
Wed Oct 23 13:07:14 EDT 2024


   Today I used chardet.detect in the repl and it returned windows-1252
   (incorrect, because it later resulted in a UnicodeDecodeError). When I ran
   chardet as a script (which uses UniversalLineDetector) this returned
   MacRoman. Isn't charset.detect the correct way? I've used this method many
   times.
   # Interpreter
   >>> contents = open(FILENAME, "rb").read()
   >>> chardet.detect(content)
   {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
   ''}
   # Terminal
   $ python -m chardet FILENAME
   FILENAME: MacRoman with confidence 0.7167379080370483
   Thanks!
   Albert-Jan


More information about the Python-list mailing list