[Tutor] string codes

Tue Nov 26 18:48:35 CET 2013

On Tue, Nov 26, 2013 at 6:34 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>
> I think that views would be useful for *very large strings*, but very
> large probably means a lot larger than you might think. For small
> strings, say under a few hundred or perhaps even thousand characters,
> making a copy of the substring will probably be faster.
>
> I say "probably", but I'm only guessing, because strings in Python don't
> have views. (Perhaps they should?)

In 2.7 and 3.x, you can use a memoryview for bytes, bytearray, etc.
Unicode strings don't support the new buffer interface. 2.x has a
buffer type, but slices create a raw byte string (UTF-16 or UTF-32):

    >>> b = buffer(u'abcd')
    >>> b[:8]
    'a\x00\x00\x00b\x00\x00\x00'
    >>> b[:8].decode('utf-32')
    u'ab'

In 3.3, a memoryview can compare strided views:

    >>> b = b'a**b**c**d**'
    >>> v = memoryview(b)
    >>> v[::3].tobytes()
    b'abcd'
    >>> v[::3] == b'abcd'
    True

http://docs.python.org/3.3/library/stdtypes.html#memory-views

In previous versions memoryview compares the raw bytes, and only for
contiguous views. For example, in 2.7:

    >>> try: v[::3] == b'abcd'
    ... except NotImplementedError: print ':-('
    ...
    :-(

http://docs.python.org/3.2/library/stdtypes.html#memoryview-type
http://docs.python.org/2.7/library/stdtypes.html#memoryview-type