[Python-ideas] str.startswith taking any iterator instead of just tuple
Terry Reedy
tjreedy at udel.edu
Mon Jan 6 06:08:15 CET 2014
On 1/5/2014 12:48 PM, Andrew Barnert wrote:
> On Jan 5, 2014, at 3:09, David Townshend
> <aquavitae69 at gmail.com> wrote:
>
>> Reading this thread made me start to think about why a string is a
>> sequence,
Because a string is defined in math/language theory as a sequence of
symbols from an alphabet. If you want to invent or define something
else, such as an atomic symbol type, please use a different term. For
example:
class Symbol:
def __init__(self, name):
self._name = name # optionally check that name is string
def __eq__(self, other):
return self._name == other._name
def __hash__(self):
return hash(self._name)
def __repr__(self):
return 'Symbol({r:})'.format(self._name)
__str__ = __repr__ # or define to tast
Now Symbols are hashable, equality-comparable, but not iterable.
In other words, I believe the desire for a non-iterable 'string' is a
desire for something that is not really a string, but is perhaps being
represented as a string merely for convenience. Using duples as
linked-list nodes (which I have done), because one does not bother to
define a node class is similar. Tuple iteration is equally meaningless
in this context as string iteration is in symbol context.
> You've seriously never indexed or sliced a string? Those are the two
> core operations in sequences, and they're obviously useful on
> strings.
And as already explained, indexable means iterable.
>> Every use case I can think of for iterating over a string either
>> involves first splitting the string, or would be better done with a
>> regex
Splitting involves forward iteration. Regex matching adds backtracking
on top of forward iteration. Please tell me a *string* algorithm that
does *not* involve character iteration somewhere.
> People have mentioned use cases for iterating strings in this thread.
> And it's easy to think of more. There are all kinds of algorithms
> that treat strings as sequences of characters. Sure, many of these
> functions are already methods on str or otherwise built into the
> stdlib, but that just means they're implemented by iterating the
> string storage in C with a loop around "*++s".
I was going to make the same point. Strings have the following methods:
'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith',
'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum',
'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower',
'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join',
'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace',
'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate',
'upper', 'zfill'. Written in Python (as in classes and PyPy!), nearly
all start with 'for c in s:' (or 'in reversed(s)'). The ones that do
not generally use len(s). Len(s) is calculated in str.__new__ with an
internal iteration: 'for char added to string, increment len counter'.
Comparing strings also involves interation, hence sorting lists of
strings by comparison
> And if you want to
> extend that set of builtins with similar functions, how else would
> you do it but with a "for ch in s" loop? (Well, you could "for ch in
> list(s)", but that's still treating strings as iterables.) For
> example, many people are asked to write a rot13 function in one of
> their first classes. How would you write that if strings weren't
> iterables? There's no way a regex is going to help you here, unless
> yo u wanted to do something like using re.sub('.') as a convoluted
> and slow way of writing map.
AFAIK, all the codecs iterate character by character.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list