[Python-ideas] Exploring the 'strview' concept further

Thu Dec 8 21:34:55 CET 2011

> But with the concrete code, I will take a stab now...
> 
> I want the ability to use a more efficient string representation when
> I know one exists -- such as when I could be using a single-byte
> charset other than Latin-1, or when the underlying data is bytes, but
> I want to treat it as text temporarily without copying the whole
> buffer.

How long is your buffer? Have you timed how long it takes to "copy" (or
decode) it?

> PyUnicode_Kind already supports the special case of
> PyUnicode_WCHAR_KIND (also known as "legacy string, not ready" --
> http://hg.python.org/cpython/file/174fbbed8747/Include/unicodeobject.h
> around line 247).  I would like to see another option for "custom
> subtype", and to accept that strings might stay in this state longer.

The unicode implementation is already complicated enough. I think adding
one further option will be a tough sell, if it doesn't exhibit major
benefits.

(note PyUnicode_WCHAR_KIND is deprecated and supposed to be removed some
day, perhaps Python 4 :-))

> I would expect bytes in particular to grow an
> as_string(encoding="Latin-1") method, which could be used to deprecate
> the various string-related methods.

Why deprecate useful functionality?

Regards

Antoine.