UTF-8 and latin1
Chris Angelico
rosuav at gmail.com
Thu Aug 18 18:20:16 EDT 2022
On Fri, 19 Aug 2022 at 08:15, Tobiah <toby at tobiah.org> wrote:
>
> > You configure the web server to send:
> >
> > Content-Type: text/html; charset=...
> >
> > in the HTTP header when it serves HTML files.
>
> So how does this break down? When a person enters
> Montréal, Quebéc into a form field, what are they
> doing on the keyboard to make that happen? As the
> string sits there in the text box, is it latin1, or utf-8
> or something else? How does the browser know what
> sort of data it has in that text box?
>
As it sits there in the text box, it is *a text string*.
When it gets sent to the server, the encoding is defined by the
browser (with reference to the server's specifications) and identified
in a request header.
The server should then receive that and interpret it as a text string.
Encodings should ONLY be relevant when data is stored in files or
transmitted across a network etc, and the rest of the time, just think
in Unicode.
Also - migrate to Python 3, your life will become a lot easier.
ChrisA
More information about the Python-list
mailing list