Python was designed (was Re: Multi-threading in Python vs Java)
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sat Oct 26 00:46:12 EDT 2013
On Fri, 25 Oct 2013 19:05:09 +0100, Mark Lawrence wrote:
> On 25/10/2013 07:14, wxjmfauth at gmail.com wrote:
>
>> Use one of the coding schemes endorsed by Unicode.
>
> As I personally know nothing about unicode for the unenlightened such as
> myself please explain this statement with respect to the fsr.
Please don't encourage JMF. You know he'll just continue with his
ridiculous vendetta against Python 3.3's Unicode handling.
>> If a dev is not able to see a non ascii char may use 10 bytes more than
>> an ascii char
>
> Are you saying that an ascii char takes a byte but a non ascii char
> takes up to 11?
He's talking about the fact that strings in Python are objects, and hence
carry a certain amount of overhead. Just to prove it's not specific to
Python 3.3, or Unicode, here's an empty byte-string in 2.6:
py> sys.getsizeof('')
24
On the other hand, this overhead becomes trivial as the string gets
bigger:
py> sys.getsizeof('x'*10**6)
1000024
Unicode is no different. Here is the hated 3.3 again:
py> sys.getsizeof('') # Unicode, not byte-string
25
py> sys.getsizeof('รณ'*10**6)
1000037
Again, a totally trivial amount of overhead. If you aren't willing to pay
that overhead for the convenience of an OOP language like Python, you
shouldn't be using an OOP language like Python.
--
Steven
More information about the Python-list
mailing list