UTF-8 and latin1

Chris Angelico rosuav at gmail.com
Thu Aug 18 18:20:16 EDT 2022


On Fri, 19 Aug 2022 at 08:15, Tobiah <toby at tobiah.org> wrote:
>
> > You configure the web server to send:
> >
> >      Content-Type: text/html; charset=...
> >
> > in the HTTP header when it serves HTML files.
>
> So how does this break down?  When a person enters
> Montréal, Quebéc into a form field, what are they
> doing on the keyboard to make that happen?  As the
> string sits there in the text box, is it latin1, or utf-8
> or something else?  How does the browser know what
> sort of data it has in that text box?
>

As it sits there in the text box, it is *a text string*.

When it gets sent to the server, the encoding is defined by the
browser (with reference to the server's specifications) and identified
in a request header.

The server should then receive that and interpret it as a text string.

Encodings should ONLY be relevant when data is stored in files or
transmitted across a network etc, and the rest of the time, just think
in Unicode.

Also - migrate to Python 3, your life will become a lot easier.

ChrisA


More information about the Python-list mailing list