IDLE "Codepage" Switching?

Thomas Passin list1 at tompassin.net
Wed Jan 18 11:05:24 EST 2023


On 1/18/2023 5:43 AM, Stephen Tucker wrote:
> Thanks for these responses.
> 
> I was encouraged to read that I'm not the only one to find this all
> confusing.
> 
> I have investigated a little further.
> 
> 1. I produced the following IDLE log:
> 
>>>> mylongstr = ""
>>>> for thisCP in range (1, 256):
> mylongstr += chr (thisCP) + " " + str (ord (chr (thisCP))) + ", "
> 
> 
>>>> print mylongstr
> 1, 2, 3, 4, 5, 6, 7, 8, 9,
>   10, 11, 12,
>   13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
> 31,   32, ! 33, " 34, # 35, $ 36, % 37, & 38, ' 39, ( 40, ) 41, * 42, + 43,
> , 44, - 45, . 46, / 47, 0 48, 1 49, 2 50, 3 51, 4 52, 5 53, 6 54, 7 55, 8
> 56, 9 57, : 58, ; 59, < 60, = 61, > 62, ? 63, @ 64, A 65, B 66, C 67, D 68,
> E 69, F 70, G 71, H 72, I 73, J 74, K 75, L 76, M 77, N 78, O 79, P 80, Q
> 81, R 82, S 83, T 84, U 85, V 86, W 87, X 88, Y 89, Z 90, [ 91, \ 92, ] 93,
> ^ 94, _ 95, ` 96, a 97, b 98, c 99, d 100, e 101, f 102, g 103, h 104, i
> 105, j 106, k 107, l 108, m 109, n 110, o 111, p 112, q 113, r 114, s 115,
> t 116, u 117, v 118, w 119, x 120, y 121, z 122, { 123, | 124, } 125, ~
> 126, 127, タ 128, チ 129, ツ 130, テ 131, ト 132, ナ 133, ニ 134, ヌ 135, ネ 136, ノ
> 137, ハ 138, ヒ 139, フ 140, ヘ 141, ホ 142, マ 143, ミ 144, ム 145, メ 146, モ 147,
> ヤ 148, ユ 149, ヨ 150, ラ 151, リ 152, ル 153, レ 154, ロ 155, ワ 156, ン 157, ゙
> 158, ゚ 159, ᅠ 160, ᄀ 161, ᄁ 162, ᆪ 163, ᄂ 164, ᆬ 165, ᆭ 166, ᄃ 167, ᄄ 168,
> ᄅ 169, ᆰ 170, ᆱ 171, ᆲ 172, ᆳ 173, ᆴ 174, ᆵ 175, ᄚ 176, ᄆ 177, ᄇ 178, ᄈ
> 179, ᄡ 180, ᄉ 181, ᄊ 182, ᄋ 183, ᄌ 184, ᄍ 185, ᄎ 186, ᄏ 187, ᄐ 188, ᄑ 189,
> ᄒ 190, ﾿ 191, À 192, Á 193, Â 194, Ã 195, Ä 196, Å 197, Æ 198, Ç 199, È
> 200, É 201, Ê 202, Ë 203, Ì 204, Í 205, Î 206, Ï 207, Ð 208, Ñ 209, Ò 210,
> Ó 211, Ô 212, Õ 213, Ö 214, × 215, Ø 216, Ù 217, Ú 218, Û 219, Ü 220, Ý
> 221, Þ 222, ß 223, à 224, á 225, â 226, ã 227, ä 228, å 229, æ 230, ç 231,
> è 232, é 233, ê 234, ë 235, ì 236, í 237, î 238, ï 239, ð 240, ñ 241, ò
> 242, ó 243, ô 244, õ 245, ö 246, ÷ 247, ø 248, ù 249, ú 250, û 251, ü 252,
> ý 253, þ 254, ÿ 255,
>>>>
> 
> 2. I copied and pasted the IDLE log into a text file and ran a program on
> it that told me about every byte in the log.
> 
> 3. I discovered the following:
> 
> Bytes 001 to 127 (01 to 7F hex) inclusive were printed as-is;
> 
> Bytes 128 to 191 (80 to BF) inclusive were output as UTF-8-encoded
> characters whose codepoints were FF00 hex more than the byte values (hence
> the strange glyphs);
> 
> Bytes 192 to 255 (C0 to FF) inclusive were output as UTF-8-encoded
> characters - without any offset being added to their codepoints in the
> meantime!
> 
> I thought you might just be interested in this - there does seem to be some
> method in IDLE's mind, at least.

This has nothing to do with IDLE.  The UTF-8 encoding of those code 
points uses two bytes instead of one.  See

https://stackoverflow.com/questions/8732025/why-degree-symbol-differs-from-utf-8-from-unicode#:~:text=UTF-8%20encodes%20the%20value%200xB0%20as%20two%20consecutive,on%20endianness%20(I%20suppose%20other%20orderings%20are%20possible).coding-in-vs-code-on-ubuntu-leading-to-unicode-error/62652695#62652695



> 
> Stephen Tucker.
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Jan 18, 2023 at 9:41 AM Peter J. Holzer <hjp-python at hjp.at> wrote:
> 
>> On 2023-01-17 22:58:53 -0500, Thomas Passin wrote:
>>> On 1/17/2023 8:46 PM, rbowman wrote:
>>>> On Tue, 17 Jan 2023 12:47:29 +0000, Stephen Tucker wrote:
>>>>> 2. Does the IDLE in Python 3.x behave the same way?
>>>>
>>>> fwiw
>>>>
>>>> Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
>>>> Type "help", "copyright", "credits" or "license()" for more
>> information.
>>>> str = ""
>>>> for c in range(140, 169):
>>>>       str += chr(c) + " "
>>>>
>>>> print(str)
>>>> Œ   Ž     ‘ ’ “ ” • – — ˜ ™ š › œ   ž Ÿ   ¡ ¢ £ ¤ ¥
>>>> ¦ § ¨
>>>>
>>>>
>>>> I don't know how this will appear since Pan is showing the icon for a
>>>> character not in its set.  However, even with more undefined characters
>>>> the printable one do not change. I get the same output running Python3
>>>> from the terminal so it's not an IDLE thing.
>>>
>>> I'm not sure what explanation is being asked for here.  Let's take
>> Python3,
>>> so we can be sure that the strings are in unicode.  The font being used
>> by
>>> the console isn't mentioned, but there's no reason it should have glyphs
>> for
>>> any random unicode character.
>>
>> Also note that the characters between 128 (U+0080) and 159 (U+009F)
>> inclusive aren't printable characters. They are control characters.
>>
>>          hp
>>
>> --
>>     _  | Peter J. Holzer    | Story must make more sense than reality.
>> |_|_) |                    |
>> | |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
>> __/   | http://www.hjp.at/ |       challenge!"
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>



More information about the Python-list mailing list