Pyhon 2.x or 3.x, which is faster?
Terry Reedy
tjreedy at udel.edu
Wed Mar 9 10:42:53 EST 2016
On 3/9/2016 9:03 AM, BartC wrote:
> I've just tried a UTF-8 file and getting some odd results. With a file
> containing [three euro symbols]:
>
> €€€
>
> (including a 3-byte utf-8 marker at the start), and opened in text mode,
> Python 3 gives me this series of bytes (ie. the ord() of each character):
>
> 239
> 187
> 191
> 226
> 8218
> 172
> 226
> 8218
> 172
> 226
> 8218
> 172
>
> And prints the resulting string as: €€€. Although this latter
> might depend on my console's code page setting.
It definitely does.
> Changing it to UTF-8 however (CHCP 65001 in Windows)
CP65001 is MS's ugly pretense of unicode compatibility. It has been
known to be buggy for over a decade, though some people claim to have
gotten some use of it.
> gives me this error when I run the program again:
>
> ----------
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> LookupError: unknown encoding: cp65001
>
> This application has requested the Runtime to terminate it in an unusual
> way.
> Please contact the application's support team for more information.
> ----------
> So I think I'll skip Unicode handling to start off with! (I've already
> had plenty of fun and games with it in the past.)
At least on Windows, use IDLE for the BMP subset of unicode. tk and
hence tkinter and IDLE can handle any char in the BMP subset. I believe
that which are actually displayed and which are shown as boxes depends
on the font. On my US Win10 system:
IDLE with Lucida Console:
>>> s = '€€€'
>>> s
'€€€'
In the console interpreter: '???' is printed.
--
Terry Jan Reedy
More information about the Python-list
mailing list