[Python-Dev] OOps (was: No 1.6! (was Re: A REALLY COOL PYTHON FEATURE:))

Christian Tismer tismer@tismer.com
Mon, 22 May 2000 14:40:51 +0200

Hi, I'm back from White Russia (yup, a surviver) :-)

Tim Peters wrote:
> [Christian Tismer]
> > ...
> > Then a string should better not be a sequence.
> >
> > The number of places where I really used the string sequence
> > protocol to take advantage of it is outperfomed by a factor
> > of ten by cases where I missed to tupleise and got a bad
> > result. A traceback is better than a sequence here.
> Alas, I think
>     for ch in string:
>         muck w/ the character ch
> is a common idiom.

And now for my proposal:

Strings should be strings, but not sequences.
Slicing is ok, and it will always yield strings.
Indexing would either
a - not yield anything but an excpetion
b - just integers instead of 1-char strings

The above idiom would read like this:

Version a: Access string elements via a coercion like tuple() or list():

    for ch in tuple(string):
        muck w/ the character ch

Version b: Access string elements as integer codes:

    for c in string:
        # either:
        ch = chr(c)
        muck w/ the character ch
        # or:
        muck w/ the character code c

> > oh-what-did-I-say-here--duck--but-isn't-it-so--cover-ly y'rs - chris
> The "sequenenceness" of strings does get in the way often enough.  Strings
> have the amazing property that, since characters are also strings,
>     while 1:
>         string = string[0]
> never terminates with an error.  This often manifests as unbounded recursion
> in generic functions that crawl over nested sequences (the first time you
> code one of these, you try to stop the recursion on a "is it a sequence?"
> test, and then someone passes in something containing a string and it
> descends forever).  And we also have that
>     format % values
> requires "values" to be specifically a tuple rather than any old sequence,
> else the current
>     "%s" % some_string
> could be interpreted the wrong way.
> There may be some hope in that the "for/in" protocol is now conflated with
> the __getitem__ protocol, so if Python grows a more general iteration
> protocol, perhaps we could back away from the sequenceness of strings
> without harming "for" iteration over the characters ...

We seem to have a similar conclusion: It would be better if strings
were no sequences, after all. How to achieve this seems to be
kind of a problem, of course.

Oh, there is another idiom possible!
How about this, after we have the new string methods :-)

    for ch in string.split():
        muck w/ the character ch

Ok, in the long term, we need to rethink iteration of course.

ciao - chris

Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     where do you want to jump today?   http://www.stackless.com