[Tutor] How to read the first so many Unicode characters from a file?

boB Stepp robertvstepp at gmail.com
Sat Jun 26 01:03:39 EDT 2021


On Fri, Jun 25, 2021 at 11:39 PM boB Stepp <robertvstepp at gmail.com> wrote:
>
> On Fri, Jun 25, 2021 at 11:25 PM Eryk Sun <eryksun at gmail.com> wrote:
> >
> > On 6/25/21, boB Stepp <robertvstepp at gmail.com> wrote:
> > > Say I have a text file with a mixture of ASCII and non-ASCII
> > > characters (assumed to be UTF-8) and I wanted to read the first N
> > > characters from the file.  The first thought that comes to mind is:
> > >
> > > with open(filename) as f:
> > >     N_characters = f.read(N)
> >
> > Assuming Python 3, you're opening the file in text mode, which reads
> > characters, not bytes. That said, you're using the default encoding
> > that's based on the platform and locale. In Windows this will be the
> > process ANSI code page, unless UTF-8 mode is enabled (e.g. `python -X
> > utf8`). You can explicitly decode the file as UTF-8 via open(filename,
> > encoding='utf-8').
>
> Ah, foolish me.  I thought I was reading about text streams in the
> docs, but I was actually in a bytes section.  This combined with where
> I am at in the book I'm reading misled me.
>
> If I specify the encoding at the top of the program file, will that
> suffice for overcoming Windows code page issues -- being ASCII not
> UTF-8?

Apparently this won't help as it only affects how Python reads the
source code (its default already) not how it executes it.  So sayeth a
Stack Overflow post I just found:]

https://stackoverflow.com/questions/14083111/should-i-use-encoding-declaration-in-python-3

boB Stepp


More information about the Tutor mailing list