Py 3.3, unicode / upper()

wxjmfauth at gmail.com wxjmfauth at gmail.com
Wed Dec 19 22:18:05 CET 2012


Le mercredi 19 décembre 2012 19:27:38 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 8:40 AM, Chris Angelico <rosuav at gmail.com> wrote:
> 
> > You may not be familiar with jmf. He's one of our resident trolls, and
> 
> > he has a bee in his bonnet about PEP 393 strings, on the basis that
> 
> > they take up more space in memory than a narrow build of Python 3.2
> 
> > would, for a string with lots of BMP characters and one non-BMP. In
> 
> > 3.2 narrow builds, strings were stored in UTF-16, with *surrogate
> 
> > pairs* for non-BMP characters. This means that len() counts them
> 
> > twice, as does string indexing/slicing. That's a major bug, especially
> 
> > as your Python code will do different things on different platforms -
> 
> > most Linux builds of 3.2 are "wide" builds, storing characters in four
> 
> > bytes each.
> 
> 
> 
> >From what I've been able to discern, his actual complaint about PEP
> 
> 393 stems from misguided moral concerns.  With PEP-393, strings that
> 
> can be fully represented in Latin-1 can be stored in half the space
> 
> (ignoring fixed overhead) compared to strings containing at least one
> 
> non-Latin-1 character.  jmf thinks this optimization is unfair to
> 
> non-English users and immoral; he wants Latin-1 strings to be treated
> 
> exactly like non-Latin-1 strings (I don't think he actually cares
> 
> about non-BMP strings at all; if narrow-build Unicode is good enough
> 
> for him, then it must be good enough for everybody).  Unfortunately
> 
> for him, the Latin-1 optimization is rather trivial in the wider
> 
> context of PEP-393, and simply removing that part alone clearly
> 
> wouldn't be doing anybody any favors.  So for him to get what he
> 
> wants, the entire PEP has to go.
> 
> 
> 
> It's rather like trying to solve the problem of wealth disparity by
> 
> forcing everyone to dump their excess wealth into the ocean.

----

latin-1 (iso-8859-1) ? are you sure ?

>>> sys.getsizeof('a')
26
>>> sys.getsizeof('ab')
27
>>> sys.getsizeof('aé')
39

Time to go to bed. More complete answer tomorrow.

jmf




More information about the Python-list mailing list