> A library implemented in a confusing way is not an example of nothing wrong on Python strings. (I myself has made this stupid mistake many times and I cannot blame neither Python nor sqlite for being careless.)
> In my humble opinion, your example does not prove that iterable strings are faulty. They can be tricky in some occasions, I admit it... but there are many tricks in all programming languages especially for newbies (currently I am trying to learn Lisp... again).
In a sense we agree. Python strings are not wrong or faulty. I think both sides of this thread are making good points, but it's ultimately a very academic discussion. Strings blur the line between scalars and iterables. Them being iterable is a bit weird sometimes and can make some code messier but it's easy enough to deal with when you know what you're doing. That kind of thing is not a good enough reason to make any drastic changes.
But as you say, they can be tricky, and that's a real problem worth paying serious attention to. I don't understand your dismissal that there are many tricks in all languages. Sure that's inevitable to a degree, but shouldn't we try to make things less tricky where we can? Python strives to be easy to use and easy to learn for beginners. Accidentally iterating over strings has probably caused many hours of frustration and confusion. It probably doesn't have that effect on anyone in this mailing list because we understand Python deeply, but we need to consider the beginner's perspective.
> Actually, `in` means the same in strings, in sequences, in lists, etc.
For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y
is equivalent to any(x is e or x == e for e in y)
.
For the string and bytes types, x in y
is True
if and only if x is a substring of y.
For user-defined classes which do not define __contains__()
but do define __iter__()
, x in y
is True
if some value z
, for which the expression x is z or x == z
is true, is produced while iterating over y
.
Lastly, the old-style iteration protocol is tried: if a class defines __getitem__()
, x in y
is True
if and only if there is a non-negative integer index i such that x is y[i] or x == y[i]
,
Strings and bytes clearly stick out as behaving differently from every built in container type and they deviate from the default implementation in terms of both __iter__ and __getitem__.
And that's fine! The behaviour is very useful. It would be sad if `c in string` was only true if `c` was a single character. My point is that sometimes the protocols and magic methods in Python aren't always in perfectly consistent harmony. Remember that I was responding to this:
> Conceptually, we should be able to reason that every object that
> supports indexing should be iterable, without adding a special case
> exception "...except for str".
We already have a special case exactly like that and it's a good thing, so it wouldn't be outrageous to add another.
> Are you implying that developers are wrong when they iterate over strings?
Roughly, though I think you might be hearing me wrong. There is lots of existing code that correctly and intentionally iterates over strings. And code that unintentionally does it probably doesn't live for long. But if you took a random sample of all the times that someone has written code that creates new behaviour which iterates over a string, most of them would be mistakes. And essentially the developer was 'wrong' in those instances. In my case, since I can't think of when I've needed to iterate over a string, I've probably been wrong at least 90% of the time.
> Does it matter in any case?
Yes, because it wastes people's time and energy debugging.
> Strings must be defined in Python in some way.
We can choose to define them differently.
> The implementation, the syntax, and the semantics of strings are coherent in Python.
They are not entirely coherent, as I have explained, and they do not have to meet any particular standard of coherence.
> Ultimately, it does [not] matter how many people iterate on strings. That is not the question.
It matters a lot, I don't know why you assert that.
> > And in the face of ambiguity, refuse the temptation to guess.
> > I do think it would be a pity if strings broke the tradition of indexable implies
> > iterable, but "A Foolish Consistency is the Hobgoblin of Little Minds". The benefits in
> > helping users when debugging would outweigh the inconsistency and the minor inconvenience
> > of adding a few characters. Users who are expecting iteration to work because indexing
> > works will quickly get a helpful error message and fix their problem. At the risk of
> > overusing classic Python sayings, Explicit is better than implicit.
> > However, we could get the benefit of making debugging easier without having to actually
> > break any existing code if we just raised a warning whenever someone iterates
> > over a string. It doesn't have to be a deprecation warning and we don't need to ever
> > actually make strings non-iterable.
> I do not agree at all.
What do you not agree with? Do you think it's more than a minor inconvenience to add ".chars()" here and there? Do you think that the benefits to debugging would be minor? Do you think that the inconsistency would significantly hurt users? I haven't seen an argument for any of these and I don't know if anything else I said was debateable.
> It is not a question of right or wrong, better or worse. It is a question of being consistent.
Why would that be the question? Why is consistency more important than "better or worse"? How can you make such a bold claim?