[Tutor] How does len() compute length of a string in UTF-8, 16, and 32?

boB Stepp robertvstepp at gmail.com
Thu Aug 10 21:48:42 EDT 2017


On Thu, Aug 10, 2017 at 8:40 PM, boB Stepp <robertvstepp at gmail.com> wrote:
> On Thu, Aug 10, 2017 at 8:01 AM, Steven D'Aprano <steve at pearwood.info> wrote:

>> Python 3 makes Unicode about as easy as it can get. To include a unicode
>> string in your source code, you just need to ensure your editor saves
>> the file as UTF-8, and then insert (by whatever input technology you
>> have) the character you want. You want a Greek pi?
>>
>> pi = "π"
>>
>> How about an Israeli sheqel?
>>
>> money = "₪1000"
>>
>> So long as your editor knows to save the file in UTF-8, it will Just
>> Work.
>
> So Python 3's default behavior for strings is to store them as UTF-8
> encodings in both RAM and files?  No funny business anywhere?  Except
> perhaps in my Windows 7 cmd.exe and PowerShell, ...

A while back I adopted a suggestion by Eryk Sun and installed ConEmu
on my Windows 7, and now use it in place of cmd.exe.  Interestingly,
it apparently provides UTF-8 support where cmd.exe and PowerShell do
not.  I just tested it with your two examples in cmd.exe, PowerShell
and both of these shells accessed via ConEmu.  Interesting!  Thanks,
Eryk, for making me aware of this program!


-- 
boB


More information about the Tutor mailing list