[Tutor] How to read the first so many Unicode characters from a file?
Eryk Sun
eryksun at gmail.com
Sat Jun 26 00:25:26 EDT 2021
On 6/25/21, boB Stepp <robertvstepp at gmail.com> wrote:
> Say I have a text file with a mixture of ASCII and non-ASCII
> characters (assumed to be UTF-8) and I wanted to read the first N
> characters from the file. The first thought that comes to mind is:
>
> with open(filename) as f:
> N_characters = f.read(N)
Assuming Python 3, you're opening the file in text mode, which reads
characters, not bytes. That said, you're using the default encoding
that's based on the platform and locale. In Windows this will be the
process ANSI code page, unless UTF-8 mode is enabled (e.g. `python -X
utf8`). You can explicitly decode the file as UTF-8 via open(filename,
encoding='utf-8').
More information about the Tutor
mailing list