Chardet oddity
Mark Bourne
nntp.mbourne at spamgourmet.com
Wed Oct 23 15:42:00 EDT 2024
Albert-Jan Roskam wrote:
> Today I used chardet.detect in the repl and it returned windows-1252
> (incorrect, because it later resulted in a UnicodeDecodeError). When I ran
> chardet as a script (which uses UniversalLineDetector) this returned
> MacRoman. Isn't charset.detect the correct way? I've used this method many
> times.
> # Interpreter
> >>> contents = open(FILENAME, "rb").read()
> >>> chardet.detect(content)
Is that copy and pasted from the terminal, or retyped with possible
transcription errors? As written, you've assigned the open file handle
to `contents`, but passed `content` (with no "s") to `chardet.detect` -
so the result would depend on whatever was previously assigned to `content`.
> {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language':
> ''}
> # Terminal
> $ python -m chardet FILENAME
> FILENAME: MacRoman with confidence 0.7167379080370483
> Thanks!
> Albert-Jan
--
Mark.
More information about the Python-list
mailing list