[Python-3000] string C API
jcarlson at uci.edu
Wed Oct 4 00:31:15 CEST 2006
"Jim Jewett" <jimjjewett at gmail.com> wrote:
> On 10/3/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > Jim Jewett schrieb:
> > > By knowing that there is only one possible representation for a given
> > > string, he skips the equivalency cache. On the other hand, he also
> > > loses the equivalency cache.
> > What is an equivalency cache, and why would one like to have one?
> Same string, different encoding.
> The Py 2.x unicode implementation saves a cached copy of the string
> encoded in the default coding, but
> (1) it always creates the UCS4 (or UCS2) encoding, even though it
> isn't always needed.
> (2) any 3rd encoding -- not matter how frequent -- requires either
> a fresh copy every time, or manual caching.
> An equivalency cache would save all input/output encodings that the
> string was recoded to/from. (Possibly only with weak references --
> the mapping itself might benefit from tuning based on various
If users don't want to recode, they should save the resulting encoding
to a local or global variable.
I'm personally not terribly concerned about needing to recode text every
time one needs to access Tcl, win32, GTK, or QT APIs. For a large
portion of the cases, Python does that now, and so far I've not heard
any substantial complaints of "Python is slow when accessing API X".
Whether we choose internal encoding based on content (Latin-1, UCS-2,
UCS-4), or choose a single internal encoding based on a tradeoff of
representation size and access time, I don't think it matters. Why? The
odds are poor that any encoding we choose will really be the right
internal encoding for more than a handful of cases, so users are going
to need to recode, or write to handle the one (or more) internal
encoding(s) available; which they already do.
More information about the Python-3000