tail
Dennis Lee Bieber
wlfraed at ix.netcom.com
Fri May 6 17:10:12 EDT 2022
On Fri, 6 May 2022 21:19:48 +0100, MRAB <python at mrabarnett.plus.com>
declaimed the following:
>Is the file UTF-8? That's a variable-width encoding, so are any of the
>characters > U+007F?
>
>Which OS? On Windows, it's common/normal for UTF-8 files to start with a
>BOM/signature, which is 3 bytes/1 codepoint.
Windows also uses <cr><lf> for the EOL marker, but Python's I/O system
condenses that to just <lf> internally (for TEXT mode) -- so using the
length of a string so read to compute a file position may be off-by-one for
each EOL in the string.
https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
"""
In text mode, the default when reading is to convert platform-specific line
endings (\n on Unix, \r\n on Windows) to just \n. When writing in text
mode, the default is to convert occurrences of \n back to platform-specific
line endings. This behind-the-scenes modification to file data is fine for
text files, but will corrupt binary data like that in JPEG or EXE files. Be
very careful to use binary mode when reading and writing such files.
"""
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed at ix.netcom.com http://wlfraed.microdiversity.freeddns.org/
More information about the Python-list
mailing list