[Python-ideas] str.startswith taking any iterator instead of just tuple

Mon Jan 6 06:08:15 CET 2014

On 1/5/2014 12:48 PM, Andrew Barnert wrote:
> On Jan 5, 2014, at 3:09, David Townshend
> <aquavitae69 at gmail.com> wrote:
>
>> Reading this thread made me start to think about why a string is a
>> sequence,

Because a string is defined in math/language theory as a sequence of 
symbols from an alphabet. If you want to invent or define something 
else, such as an atomic symbol type, please use a different term. For 
example:

class Symbol:
   def __init__(self, name):
     self._name = name  # optionally check that name is string
   def __eq__(self, other):
     return self._name == other._name
   def __hash__(self):
     return hash(self._name)
   def __repr__(self):
     return 'Symbol({r:})'.format(self._name)
   __str__ = __repr__  # or define to tast

Now Symbols are hashable, equality-comparable, but not iterable.

In other words, I believe the desire for a non-iterable 'string' is a 
desire for something that is not really a string, but is perhaps being 
represented as a string merely for convenience. Using duples as 
linked-list nodes (which I have done), because one does not bother to 
define a node class is similar. Tuple iteration is equally meaningless 
in this context as string iteration is in symbol context.

> You've seriously never indexed or sliced a string? Those are the two
> core operations in sequences, and they're obviously useful on
> strings.

And as already explained, indexable means iterable.

>> Every use case I can think of for iterating over a string either
>> involves first splitting the string, or would be better done with a
>> regex

Splitting involves forward iteration. Regex matching adds backtracking 
on top of forward iteration. Please tell me a *string* algorithm that 
does *not* involve character iteration somewhere.

> People have mentioned use cases for iterating strings in this thread.
> And it's easy to think of more. There are all kinds of algorithms
> that treat strings as sequences of characters. Sure, many of these
> functions are already methods on str or otherwise built into the
> stdlib, but that just means they're implemented by iterating the
> string storage in C with a loop around "*++s".

I was going to make the same point. Strings have the following methods: 
'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 
'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 
'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 
'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 
'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 
'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 
'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 
'upper', 'zfill'. Written in Python (as in classes and PyPy!), nearly 
all start with 'for c in s:' (or 'in reversed(s)').  The ones that do 
not generally use len(s). Len(s) is calculated in str.__new__ with an 
internal iteration: 'for char added to string, increment len counter'.

Comparing strings also involves interation, hence sorting lists of 
strings by comparison

> And if you want to
> extend that set of builtins with similar functions, how else would
> you do it but with a "for ch in s" loop? (Well, you could "for ch in
> list(s)", but that's still treating strings as iterables.) For
> example, many people are asked to write a rot13 function in one of
> their first classes. How would you write that if strings weren't
> iterables? There's no way a regex is going to help you here, unless
> yo u wanted to do something like using re.sub('.') as a convoluted
> and slow way of writing map.

AFAIK, all the codecs iterate character by character.

-- 
Terry Jan Reedy