[Tutor] How does len() compute length of a string in UTF-8, 16, and 32?
boB Stepp
robertvstepp at gmail.com
Thu Aug 10 21:40:05 EDT 2017
On Thu, Aug 10, 2017 at 8:01 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>
> Another **Must Read** resource for unicode is:
>
> The Absolute Minimum Every Software Developer Absolutely Positively Must
> Know About Unicode (No Excuses!)
>
> https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
This was an enjoyable read, but did not have as much technical detail
as the two videos Zach had referenced. But then the author did say
"the absolute minimum ...". I will strive to avoid peeling onions on
a sub!
> (By the way, it is nearly 14 years later, and PHP still believes that
> the world is ASCII.)
I thought you must surely be engaging in hyperbole, but at
http://php.net/manual/en/xml.encoding.php I found:
"The default source encoding used by PHP is ISO-8859-1."
>
> Python 3 makes Unicode about as easy as it can get. To include a unicode
> string in your source code, you just need to ensure your editor saves
> the file as UTF-8, and then insert (by whatever input technology you
> have) the character you want. You want a Greek pi?
>
> pi = "π"
>
> How about an Israeli sheqel?
>
> money = "₪1000"
>
> So long as your editor knows to save the file in UTF-8, it will Just
> Work.
So Python 3's default behavior for strings is to store them as UTF-8
encodings in both RAM and files? No funny business anywhere? Except
perhaps in my Windows 7 cmd.exe and PowerShell, but that's not
Python's fault. Which makes me wonder, what is my editor's default
encoding/decoding? I will have to investigate!
Cheers!
--
boB
More information about the Tutor
mailing list