
On 1/5/2014 12:48 PM, Andrew Barnert wrote:
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a sequence,
Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example: class Symbol: def __init__(self, name): self._name = name # optionally check that name is string def __eq__(self, other): return self._name == other._name def __hash__(self): return hash(self._name) def __repr__(self): return 'Symbol({r:})'.format(self._name) __str__ = __repr__ # or define to tast Now Symbols are hashable, equality-comparable, but not iterable. In other words, I believe the desire for a non-iterable 'string' is a desire for something that is not really a string, but is perhaps being represented as a string merely for convenience. Using duples as linked-list nodes (which I have done), because one does not bother to define a node class is similar. Tuple iteration is equally meaningless in this context as string iteration is in symbol context.
You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings.
And as already explained, indexable means iterable.
Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex
Splitting involves forward iteration. Regex matching adds backtracking on top of forward iteration. Please tell me a *string* algorithm that does *not* involve character iteration somewhere.
People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s".
I was going to make the same point. Strings have the following methods: 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'. Written in Python (as in classes and PyPy!), nearly all start with 'for c in s:' (or 'in reversed(s)'). The ones that do not generally use len(s). Len(s) is calculated in str.__new__ with an internal iteration: 'for char added to string, increment len counter'. Comparing strings also involves interation, hence sorting lists of strings by comparison
And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless yo u wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.
AFAIK, all the codecs iterate character by character. -- Terry Jan Reedy