[Python-Dev] Iterable String Redux (aka String ABC)

Mike Klaas mike.klaas at gmail.com
Thu May 29 04:59:02 CEST 2008


On 28-May-08, at 5:44 PM, Greg Ewing wrote:

> Mike Klaas wrote:
>
>> In my perfect world, strings would be indicable and sliceable, but  
>> not  iterable.
>
> An object that was indexable but not iterable would
> be a very strange thing. If it has __len__ and __getitem__,
> there's nothing to stop you iterating over it by hand
> anyway, so disallowing __iter__ would just seem perverse.

Python has a beautiful abstraction in iteration: iter() is a generic  
function that allows you lazily consume a sequence of objects, whether  
it be lists, tuples, custom iterators, generators, or what have you.   
It is trivial to write your code to be agnostic to the type of  
iterable passed-in.  Almost anything else a consumer of your code  
passes in will result in an immediate exception.

Unfortunately, python has two extremely common data types which do not  
fail when this generic function is applied to them, and instead almost  
always returns a result which is not desired.  Instead, it iterates  
over the characters of the string, a behaviour which is rarely needed  
in practice due to the wealth of methods available.

I agree that it would be perverse to disallowing iterating over a  
string.  I just wish that the way to do that wasn't glommed on to the  
object-iteration abstraction.

As it stands, any consumer of iterables has to keep strings in mind.   
It is particularly irksome when the target input is an iterable of  
strings.  I recall a function that accepts a list/iterable of item  
keys, hashes them, and then retrieves values based on the item hashes  
(usually over the network, so it is necessary to batch requests).   
This function is often used in the interactive interpreter, and it is  
thus very prone to being passed-in a string rather than a list.  There  
was no good way to prevent the (frequent) mysterious "not found"  
errors save adding an explicit type check for basestring.

String already behaves slightly differently from the way other  
sequences act:  It is the only sequence for which 'seq in seq' is  
true, and the only sequence for which 'x in seq' can be true but  
'any(x==item for item in seq)' is false.  Abstractions are sometimes  
imperfect: this is why there is an explicit typecheck for strings in  
the sum() builtin.

I'll stop here as I realize that the likelihood that this will be  
accepted is terribly small, especially considering the late stage of  
the process.  But I would be willing to develop a patch that  
implements this behaviour on the off chance it is.

-Mike


More information about the Python-Dev mailing list