Cult-like behaviour [was Re: Kindness]
Marko Rauhamaa
marko at pacujo.net
Sun Jul 15 04:39:40 EDT 2018
Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> Of course we have no idea what Marko's software is, or what it is doing,
Correct, you don't, but the link Paul Rubin posted gives you an idea:
Python 3 says: everything is Unicode (by default, except in certain
situations, and except if we send you crazy reencoded data, and even
then it's sometimes still unicode, albeit wrong unicode). Filenames
are Unicode, Terminals are Unicode, stdin and out are Unicode, there
is so much Unicode! And because UNIX is not Unicode, Python 3 now has
the stance that it's right and UNIX is wrong
<URL: http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/>
> [Marko]
>>> No, as a large number of Python3 facilities require str objects as
>>> arguments. Consider urllib.request.urlopen(), for example, which
>>> requires a URL to be an str object.
>
> That's because URLs are fundamentally text strings.
<URL: https://tools.ietf.org/html/rfc1738>:
In most URL schemes, the sequences of characters in different parts
of a URL are used to represent sequences of octets used in Internet
protocols. For example, in the ftp scheme, the host name, directory
name and file names are such sequences of octets, represented by
parts of the URL.
(RFC 3986 says the same thing in a more roundabout way.)
A URL consists of ASCII-only characters that represent an octet string.
Of course, ASCII characters *are* Unicode characters.
> Quick quiz: which of the following are real URLs?
> (a) http://правительство.рф
On the face of it, that is not a valid URL. However, hostnames can be
dealt with somewhat bijectively using punycode.
But try this:
>>> import http.client
>>> conn = http.client.HTTPConnection("example.com")
>>> conn.request("GET", "/ä")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.5/http/client.py", line 1107, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python3.5/http/client.py", line 1142, in _send_request
self.putrequest(method, url, **skips)
File "/usr/lib64/python3.5/http/client.py", line 984, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in positi\
on 5: ordinal not in range(128)
>>> conn = http.client.HTTPConnection("example.com")
>>> conn.request("GÄT", "/")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.5/http/client.py", line 1107, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python3.5/http/client.py", line 1142, in _send_request
self.putrequest(method, url, **skips)
File "/usr/lib64/python3.5/http/client.py", line 984, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xc4' in positi\
on 1: ordinal not in range(128)
IOW, the method and URL path given to conn.request are str objects but
they are really just thinly veiled containers for ASCII bytes objects.
That approach is very similar to mine.
Marko
More information about the Python-list
mailing list