Re: [Python-ideas] str.startswith taking any iterator instead of just tuple

Jan. 5, 2014

      On 1/5/2014 12:48 PM, Andrew Barnert wrote:
...
On Jan 5, 2014, at 3:09, David Townshend
<aquavitae69@gmail.com> wrote:
...
Reading this thread made me start to think about why a string is a
sequence,
Because a string is defined in math/language theory as a sequence of 
symbols from an alphabet. If you want to invent or define something 
else, such as an atomic symbol type, please use a different term. For 
example:

class Symbol:
   def __init__(self, name):
     self._name = name  # optionally check that name is string
   def __eq__(self, other):
     return self._name == other._name
   def __hash__(self):
     return hash(self._name)
   def __repr__(self):
     return 'Symbol({r:})'.format(self._name)
   __str__ = __repr__  # or define to tast

Now Symbols are hashable, equality-comparable, but not iterable.

In other words, I believe the desire for a non-iterable 'string' is a 
desire for something that is not really a string, but is perhaps being 
represented as a string merely for convenience. Using duples as 
linked-list nodes (which I have done), because one does not bother to 
define a node class is similar. Tuple iteration is equally meaningless 
in this context as string iteration is in symbol context.
...
You've seriously never indexed or sliced a string? Those are the two
core operations in sequences, and they're obviously useful on
strings.
And as already explained, indexable means iterable.
...
...
Every use case I can think of for iterating over a string either
involves first splitting the string, or would be better done with a
regex
Splitting involves forward iteration. Regex matching adds backtracking 
on top of forward iteration. Please tell me a *string* algorithm that 
does *not* involve character iteration somewhere.
...
People have mentioned use cases for iterating strings in this thread.
And it's easy to think of more. There are all kinds of algorithms
that treat strings as sequences of characters. Sure, many of these
functions are already methods on str or otherwise built into the
stdlib, but that just means they're implemented by iterating the
string storage in C with a loop around "*++s".
I was going to make the same point. Strings have the following methods: 
'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 
'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 
'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 
'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 
'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 
'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 
'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 
'upper', 'zfill'. Written in Python (as in classes and PyPy!), nearly 
all start with 'for c in s:' (or 'in reversed(s)').  The ones that do 
not generally use len(s). Len(s) is calculated in str.__new__ with an 
internal iteration: 'for char added to string, increment len counter'.

Comparing strings also involves interation, hence sorting lists of 
strings by comparison
...
And if you want to
extend that set of builtins with similar functions, how else would
you do it but with a "for ch in s" loop? (Well, you could "for ch in
list(s)", but that's still treating strings as iterables.) For
example, many people are asked to write a rot13 function in one of
their first classes. How would you write that if strings weren't
iterables? There's no way a regex is going to help you here, unless
yo u wanted to do something like using re.sub('.') as a convoluted
and slow way of writing map.
AFAIK, all the codecs iterate character by character.

-- 
Terry Jan Reedy

Re: [Python-ideas] str.startswith taking any iterator instead of just tuple

Terry Reedy