[Python-3000] Making strings non-iterable
Nick Coghlan
ncoghlan at gmail.com
Fri Apr 14 07:22:43 CEST 2006
Ian Bicking wrote:
> I propose that strings (unicode/text) shouldn't be iterable. Seeing this:
>
> <ul>
> <li> i
> <li> t
> <li> e
> <li> m
> <li>
> <li> 1
> </ul>
>
> a few too many times... it's annoying. Instead, I propose that strings
> get a list-like view on their characters. Oh synergy!
Another +1 here.
Some other details:
__contains__ would still be there, so "substr in string" would still work
__getitem__ would still be there, so slicing would work
To remove the iterable behaviour either iter() would have to change so that
the default "all sequences are iterable" behaviour goes away (which Guido has
suggested previously) or else the __iter__ method of strings would need to
explicitly raise TypeError.
My preference is the latter (as it's less likely to break other classes), but
the former could also work if python3warn flagged classes which defined
__getitem__ without also defining __iter__.
> Iterating over strings causes frequent hard bugs (bad data, as opposed
> to exceptions which make for easy bugs), as the bug can manifest itself
> far from its origination. Also strings aren't containers. Because
> Python has no characters, only strings, as a result strings look like
> they contain strings, and those strings in turn contain themselves. It
> just doesn't make sense. And it is because a string and the characters
> it contains are interchangeable (they are both strings) that the
> resulting bugs can persist without exceptions.
And far too many uses of itertools end up having to either check an
"atomic_types" set (usually consisting of str and unicode), or else a
self-recursion check (iter(x).next() is x) to avoid breaking in horrible ways.
Away with them, I say!
> Should bytes be iterable as well? Because bytes (the container) and
> integers are not interchangeable, the problems that occur with strings
> seem much less likely, and the container-like nature of bytes is
> clearer. So I don't propose this effect bytes in any way.
Agreed - there's no self-recursive iteration here, so no real problems.
> Questions:
>
> * .chars() doesn't return characters; should it be named something else?
Why do you say it doesn't return characters? Python's chars are just strings
of length 1, and that's what this view will contain.
> * Should it be a method that is called? dict.keys() has a legacy, but
> this does not. There is presumably very little overhead to getting this
> view. However, symmetry with the only other views we are considering
> (dictionary views) would indicate it should be a method. Also, there
> are no attributes on strings currently.
Using methods for view creation is fine by me. The various ideas for turning
them into attributes instead are cute, but not particularly compelling.
> * Are there other views on strings? Can string->byte encoding be
> usefully seen as a view in some cases?
Given that all Py3k strings will be Unicode, I think providing a view that
exposed the code points of the characters would be good. Being able to write
mystr.codes() instead of [ord(c) for c in mystr.chars()] would be a good thing.
Interestingly, ord(c) would then be little more than an abbreviation of
c.codes()[0].
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-3000
mailing list