[Python-3000] Making more effective use of slice objects in Py3k

Fredrik Lundh fredrik at pythonware.com
Thu Aug 31 10:21:00 CEST 2006


Jack Diederich wrote:

> That said can you guys expand on what polymorphic[1] means here in particular?
> Python wise I can only think of the str/unicode/buffer split.  If the 
> fraternity of strings doesn't include views (which I haven't needed either)
> what are you considering for the other kinds?

the idea is to allow a given string object to use different kinds of 
storage depending on what data it contains, and how it's being used.

off the top of my head, I'd imagine using at least:

     wide unicode (32-bit)
     8-bit ascii/iso-8859-1
     utf-8

and possibly also one or more of

     narrow unicode (16-bit)
     8-bit encoded (arbitrary 8-bit encodings)
     utf-16
     selected asian encodings

all these look and behave the same at the Python level, as well as when 
using "high-level" C API:s.  ob_type may differ (also during an object's 
lifetime), but type(s) is always the same.

this approach gives you lots of advantages:

- lots of operations can be carried out without having to convert the 
  data (all the formats listed above supports forward iteration, and 
most text-level operations).

- you'll save tons of memory in applications that uses text mostly in a 
few character sets (and less memory means more speed).

- adding (or removing) specific string implementations becomes trivial, 
both for the core developers and extension writers.

etc.

the main disadvantage is that it becomes a bit more difficult to deal 
with strings at the C level (but properly dealing with both 8-bit and 
Unicode strings is already a pain in the ass, and I'm not sure this has 
to be any harder.  just slightly different).

for some details on apple's implementation (thanks bob!), see:

https://developer.apple.com/documentation/CoreFoundation/Conceptual/CFStrings/Concepts/StringStorage.html

</F>



More information about the Python-3000 mailing list