str.startswith taking any iterator instead of just tuple
data:image/s3,"s3://crabby-images/7512f/7512f7eaafca3d5f6c2d0754d50e40b6b0da65ba" alt=""
Some functions and methods allow the provision of a tuple of arguments which will be looped over internally. e.g., 'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' isinstance(42, (float, int)) In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to perform this internal iteration. As a result, the following are considered invalid: 'spam'.startswith(['s', 'z']) 'spam'.startswith({'s', 'z'}) 'spam'.startswith(x for x in 'sz') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple There are two common workarounds: 'spam'.startswith(tuple({'s', 'z'})) any('spam'.startwith(c) for c in {'s', 'z'}) Of course, the following construction already has a clear, separate meaning: 'spam'.startswith('sz') # 'spam' starts with 'sz' In these cases, could we supplant the PyTuple_Check with one that would allow any iterator? Alternatively, could add this as an additional branch? The code would look something like: it = PyObject_GetIter(subobj); if (it == NULL) return NULL; iternext = *Py_TYPE(it)->tp_iternext; for(;;) { substring = iternext(it); if (substring == NULL) Py_RETURN_FALSE; result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result) Py_RETURN_TRUE; } Of course, in the case of methods like .startswith, this would need to ensure the following behaviour remains unchanged. The following should always check if 'spam' starts with 'sz' not starts with 's' or with 'z': 'spam'.startswith('sz') I searched bugs.python.org and python-ideas for any previous discussion of this topic. If this seems reasonable, I can submit an enhancement to bugs.python.org with a patch for unicodeobject.c:unicode_startswith Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?) On Thu, Jan 2, 2014 at 10:29 AM, James Powell <james@dontusethiscode.com> wrote:
Some functions and methods allow the provision of a tuple of arguments which will be looped over internally. e.g.,
'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' isinstance(42, (float, int))
In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to perform this internal iteration.
As a result, the following are considered invalid:
'spam'.startswith(['s', 'z']) 'spam'.startswith({'s', 'z'}) 'spam'.startswith(x for x in 'sz')
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple
There are two common workarounds:
'spam'.startswith(tuple({'s', 'z'})) any('spam'.startwith(c) for c in {'s', 'z'})
Of course, the following construction already has a clear, separate meaning:
'spam'.startswith('sz') # 'spam' starts with 'sz'
In these cases, could we supplant the PyTuple_Check with one that would allow any iterator? Alternatively, could add this as an additional branch?
The code would look something like:
it = PyObject_GetIter(subobj); if (it == NULL) return NULL;
iternext = *Py_TYPE(it)->tp_iternext; for(;;) { substring = iternext(it); if (substring == NULL) Py_RETURN_FALSE; result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result) Py_RETURN_TRUE; }
Of course, in the case of methods like .startswith, this would need to ensure the following behaviour remains unchanged. The following should always check if 'spam' starts with 'sz' not starts with 's' or with 'z':
'spam'.startswith('sz')
I searched bugs.python.org and python-ideas for any previous discussion of this topic. If this seems reasonable, I can submit an enhancement to bugs.python.org with a patch for unicodeobject.c:unicode_startswith
Cheers, James Powell
follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/c5965/c5965079bd4da2a2c10fd2dd016a01da61ce5670" alt=""
I could see expanding to allow lists/sets as well as tuples being useful, e.g. for using dynamically generated prefix lists without creating additional tuple objects, but I don't see arbitrary iteration being necessary. On Thu Jan 02 2014 at 3:25:20 PM, Guido van Rossum <guido@python.org> wrote:
The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?)
On Thu, Jan 2, 2014 at 10:29 AM, James Powell <james@dontusethiscode.com> wrote:
Some functions and methods allow the provision of a tuple of arguments which will be looped over internally. e.g.,
'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' isinstance(42, (float, int))
In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to perform this internal iteration.
As a result, the following are considered invalid:
'spam'.startswith(['s', 'z']) 'spam'.startswith({'s', 'z'}) 'spam'.startswith(x for x in 'sz')
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple
There are two common workarounds:
'spam'.startswith(tuple({'s', 'z'})) any('spam'.startwith(c) for c in {'s', 'z'})
Of course, the following construction already has a clear, separate meaning:
'spam'.startswith('sz') # 'spam' starts with 'sz'
In these cases, could we supplant the PyTuple_Check with one that would allow any iterator? Alternatively, could add this as an additional branch?
The code would look something like:
it = PyObject_GetIter(subobj); if (it == NULL) return NULL;
iternext = *Py_TYPE(it)->tp_iternext; for(;;) { substring = iternext(it); if (substring == NULL) Py_RETURN_FALSE; result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result) Py_RETURN_TRUE; }
Of course, in the case of methods like .startswith, this would need to ensure the following behaviour remains unchanged. The following should always check if 'spam' starts with 'sz' not starts with 's' or with 'z':
'spam'.startswith('sz')
I searched bugs.python.org and python-ideas for any previous discussion of this topic. If this seems reasonable, I can submit an enhancement to bugs.python.org with a patch for unicodeobject.c:unicode_startswith
Cheers, James Powell
follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/7512f/7512f7eaafca3d5f6c2d0754d50e40b6b0da65ba" alt=""
On 01/02/2014 06:24 PM, Guido van Rossum wrote:
The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?)
This is driven by a real-world example wherein a large number of prefixes stored in a set, necessitating: any('spam'.startswith(c) for c in prefixes) # or 'spam'.startswith(tuple(prefixes)) However, .startswith doesn't seem to be the only example of this, and the other examples are free of the string/iterable ambiguity: isinstance(x, {int, float}) I do agree that it's definitely important to retain the behaviour of: 'spam'.startswith('sz') At same time, I think the non-string iterable problem is already fairly well-known and not a source of great confusion. How often has one typed: isinstance(x, Iterable) and not isinstance(x, str) Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Thu, Jan 2, 2014 at 1:37 PM, James Powell <james@dontusethiscode.com> wrote:
On 01/02/2014 06:24 PM, Guido van Rossum wrote:
The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?)
This is driven by a real-world example wherein a large number of prefixes stored in a set, necessitating:
any('spam'.startswith(c) for c in prefixes) # or 'spam'.startswith(tuple(prefixes))
Neither of these strikes me as bad. Also, depending on whether the set of prefixes itself changes dynamically, it may be best to lift the tuple() call out of the startswith() call. Note that for performance, I suspect that the any() version will be slower if you can avoid calling tuple() every time -- I recall once finding that x.startswith('ab') benchmarked slower than x[:2] == 'ab' because the name lookup for 'startswith' dominated the overall time.
However, .startswith doesn't seem to be the only example of this, and the other examples are free of the string/iterable ambiguity:
isinstance(x, {int, float})
But this is even less likely to have a dynamically generated argument. And there could still be another ambiguity here: a metaclass could conceivably make its instances (i.e. classes) iterable.
I do agree that it's definitely important to retain the behaviour of:
'spam'.startswith('sz')
Duh. :-)
At same time, I think the non-string iterable problem is already fairly well-known and not a source of great confusion. How often has one typed:
isinstance(x, Iterable) and not isinstance(x, str)
If you find yourself typing that a lot I think you have a bigger problem though. All in all I hope you will give up your push for this feature. It just doesn't seem all that important, and you really just move the inconsistency to a different place (special-casing strings instead of tuples). -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
By designing an API that doesn't require such overloading. On Thursday, January 2, 2014, Alexander Heger wrote:
isinstance(x, Iterable) and not isinstance(x, str)
If you find yourself typing that a lot I think you have a bigger problem though.
How do you replace this? _______________________________________________ Python-ideas mailing list Python-ideas@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (on iPad)
data:image/s3,"s3://crabby-images/1a295/1a2956530e1164ab20aa31e6c2c76b2d466faf44" alt=""
Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones. Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex. Also, the only times I can recall using a string as a sequence is in doctests (because it reads better than a list of characters) or in the interpreter when I'm trying something out. I'm not suggesting changing it - there's too much history for that, but I am interested to know if there is some fundamental reason that strings are sequences. If a new string object was being implemented now, would it be a sequence? On 3 Jan 2014 02:49, "Guido van Rossum" <guido@python.org> wrote:
By designing an API that doesn't require such overloading.
On Thursday, January 2, 2014, Alexander Heger wrote:
isinstance(x, Iterable) and not isinstance(x, str)
If you find yourself typing that a lot I think you have a bigger problem though.
How do you replace this? _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (on iPad)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones.
You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings.
Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex
People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.
data:image/s3,"s3://crabby-images/aae51/aae51c22b6688bdfad340461c5612a190646b557" alt=""
People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.
whereas the issue seems now settled, you could use explicit functions like str.iter(), str.codepoints(), str.substr(), ...
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Jan 5, 2014, at 11:02, Alexander Heger <python@2sn.net> wrote:
People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.
whereas the issue seems now settled, you could use explicit functions like str.iter(), str.codepoints(), str.substr(), ...
Sure, and we could add list.iter(), list.slice(), etc. and get rid of iterables, indexing and slicing, entirely. If we add separate map and similar methods to every iterable type, we can even get rid of iterators. If it's good enough for ObjC, why should Python try to be more readable or concise?
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 6, 2014 at 4:48 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.)
You could simply "for ch in s.split('')". A number of languages define that to mean fracturing a string into one-character strings. Python currently raises ValueError, so it won't break existing code. But yes, it's easier to be able to iterate over a string. ChrisA
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 1/5/2014 12:48 PM, Andrew Barnert wrote:
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a sequence,
Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example: class Symbol: def __init__(self, name): self._name = name # optionally check that name is string def __eq__(self, other): return self._name == other._name def __hash__(self): return hash(self._name) def __repr__(self): return 'Symbol({r:})'.format(self._name) __str__ = __repr__ # or define to tast Now Symbols are hashable, equality-comparable, but not iterable. In other words, I believe the desire for a non-iterable 'string' is a desire for something that is not really a string, but is perhaps being represented as a string merely for convenience. Using duples as linked-list nodes (which I have done), because one does not bother to define a node class is similar. Tuple iteration is equally meaningless in this context as string iteration is in symbol context.
You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings.
And as already explained, indexable means iterable.
Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex
Splitting involves forward iteration. Regex matching adds backtracking on top of forward iteration. Please tell me a *string* algorithm that does *not* involve character iteration somewhere.
People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s".
I was going to make the same point. Strings have the following methods: 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'. Written in Python (as in classes and PyPy!), nearly all start with 'for c in s:' (or 'in reversed(s)'). The ones that do not generally use len(s). Len(s) is calculated in str.__new__ with an internal iteration: 'for char added to string, increment len counter'. Comparing strings also involves interation, hence sorting lists of strings by comparison
And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless yo u wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.
AFAIK, all the codecs iterate character by character. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/600af/600af0bbcc432b8ca2fa4d01f09c63633eb2f1a7" alt=""
On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 1/5/2014 12:48 PM, Andrew Barnert wrote:
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a sequence,
Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example:
And sequences in math / CS are functions from the natural numbers to elements of the sequence. Since isinstance(str, types.FunctionType) isn't True, it must mean that Python strings aren't strings. But seriously, Python functions aren't functions, the set of Python complex numbers is not the set of complex numbers, Python types aren't types, and Python addition is not addition; mathematical terminology in programming is evocative and not actually literally true. Arguments based on trying to literally copy math to the letter are flawed, probably irretrievably so. The important feature of strings in math is not that they are literally a sequence of characters, but that they correspond to a sequence of characters isomorphically. You can represent them any way you like, as long as you maintain that isomorphism, and the operations with the right names do the right thing, etc. As evidence, observe that not every programming language has its string type obey the equivalent of Python's sequence interface or math's notion of "sequence" per se (mapping naturals to elements). For example, Haskell strings are linked lists; Rust strings are arrays behind the scenes but don't expose it within the str type; etc. It's not just strings, either, There are a multitude of ways of defining the natural numbers -- maybe a natural number is a set of a given structure (and which structure?), maybe it is a pair of integers where the second integer is 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a bitstring of arbitrary finite length. The usual construction in math is the first, but Python uses the last one. To say Python doesn't actually have natural numbers but does have strings, is absurd, but it is what your logic points towards. If two things are equivalent, everything said about one can be said about the other, and math is about saying things about stuff, not about precise definitions of structure -- those are chosen for convenience. -- Devin
data:image/s3,"s3://crabby-images/600af/600af0bbcc432b8ca2fa4d01f09c63633eb2f1a7" alt=""
On Mon, Jan 6, 2014 at 12:09 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 1/5/2014 12:48 PM, Andrew Barnert wrote:
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a sequence,
Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example:
And sequences in math / CS are functions from the natural numbers to elements of the sequence. Since isinstance(str, types.FunctionType) isn't True, it must mean that Python strings aren't strings.
But seriously, Python functions aren't functions, the set of Python complex numbers is not the set of complex numbers, Python types aren't types, and Python addition is not addition; mathematical terminology in programming is evocative and not actually literally true. Arguments based on trying to literally copy math to the letter are flawed, probably irretrievably so.
The important feature of strings in math is not that they are literally a sequence of characters, but that they correspond to a sequence of characters isomorphically. You can represent them any way you like, as long as you maintain that isomorphism, and the operations with the right names do the right thing, etc. As evidence, observe that not every programming language has its string type obey the equivalent of Python's sequence interface or math's notion of "sequence" per se (mapping naturals to elements). For example, Haskell strings are linked lists; Rust strings are arrays behind the scenes but don't expose it within the str type; etc.
It's not just strings, either, There are a multitude of ways of defining the natural numbers -- maybe a natural number is a set of a given structure (and which structure?), maybe it is a pair of integers where the second integer is 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a bitstring of arbitrary finite length. The usual construction in math is the first, but Python uses the last one. To say Python doesn't actually have natural numbers but does have strings, is absurd, but it is what your logic points towards. If two things are equivalent, everything said about one can be said about the other, and math is about saying things about stuff, not about precise definitions of structure -- those are chosen for convenience.
Apologies, I wasn't thinking much and bungled that last argument (should've talked about integers instead of naturals; and even did, for half of it...). Fixed: [...] There are a multitude of ways of defining the integers -- maybe an integer is an equivalence class over the pairs of naturals, maybe it is rational number with denominator 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a two's complement bitstring of arbitrary length. The usual construction in math is the first (or the second to last), but Python uses the last one. To say that Python doesn't actually have integers, but does have strings, is absurd, but [...] -- Devin
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 1/6/2014 3:09 AM, Devin Jeanpierre wrote:
On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a sequence,
Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example:
And sequences in math / CS are functions from the natural numbers to elements of the sequence.
And functions (mappings) in math are defined either by a rule for calculating the output from the input or by a table (set of pairs) giving the output for each input. If the input domain is the finite sequence of counts from 0 to k, the table can be condensed to a sequence of k+1 output values.
Since isinstance(str, types.FunctionType) isn't True,
Python has multiple builtin callable types, and users can define more, so you need to expand that test. Anyway, since a string is not a function defined by rule, it must be a function defined by a table. Since the input domain is a finite sequence of counts, we can and do condense the table to a sequence of output values. Which is an expansion of what I said.
[snip] -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/600af/600af0bbcc432b8ca2fa4d01f09c63633eb2f1a7" alt=""
On Mon, Jan 6, 2014 at 3:39 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Since isinstance(str, types.FunctionType) isn't True,
Python has multiple builtin callable types, and users can define more, so you need to expand that test. Anyway, since a string is not a function defined by rule, it must be a function defined by a table. Since the input domain is a finite sequence of counts, we can and do condense the table to a sequence of output values. Which is an expansion of what I said.
No, I don't need to expand the test -- the limitation of the test was the entire point. I was making fun of your argument that because the mathematical terms are the same, therefore they must be the same in Python. "strings are sequences in math, therefore they are in python" is a superficial and fundamentally wrong argument. Here's another argument of that form: "the nth element of a string is not a string in math, therefore the nth element of a string is not a string in Python". That's a lie, of course. There are too many ways that type of argument falls flat. -- Devin
data:image/s3,"s3://crabby-images/92199/921992943324c6708ae0f5518106ecf72b9897b1" alt=""
On Sun, Jan 5, 2014 at 9:48 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones.
You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings.
I am doing most coding in two languages right now: Python and Javascript. I have never wished that Python had string.charAt(i) but I have often wished that Javascript had string[i]. When I've iterated over the characters in a string in Javascript, it has never occurred to me to write it using str.split(''). By irrelevant analogy, I have never used complex numbers in Python or Javascript and I can't see any obvious reason to support them. It just confuses people who inadvertently write cmath.sqrt instead of math.sqrt. For the few people that use complex numbers, they would be better served by a tuple of real and imaginary parts. As someone who doesn't use them, my opinion is clearly more important that that of those that use them. --- Bruce Learn how hackers think: http://j.mp/gruyere-security (Not serious about removing complex numbers from Python. If you didn't see the sarcasm, sorry.)
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Jan 5, 2014 4:10 AM, "David Townshend" <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a
sequence, and I can't actually see any obvious reason, other than historical ones. Sometimes I think it would be more clear if strings weren't sequences but had various attributes that exposed sequence "views", e.g. codepoints, etc. Making strings non-sequences isn't realistic at this point, but adding the sequence view attributes may still be nice. That said, at present it's not something I personally have any use case for. There was an article floating around the web recently where the deficiencies of unicode implementations was discussed and I recall something there or in related discussions about use cases for having different views into a string. Wow that was vague. :) The different views into unicode strings certainly comes up from time to time on our lists. -eric
data:image/s3,"s3://crabby-images/d31f0/d31f0cb9f83406f8ba96514a12e7bcd7de11f8cc" alt=""
On 01/05/2014 06:49 PM, Eric Snow wrote:
On Jan 5, 2014 4:10 AM, "David Townshend" <aquavitae69@gmail.com> wrote:
Reading this thread made me start to think about why a string is a
sequence, and I can't actually see any obvious reason, other than historical ones.
Sometimes I think it would be more clear if strings weren't sequences but had various attributes that exposed sequence "views", e.g. codepoints, etc. Making strings non-sequences isn't realistic at this point, but adding the sequence view attributes may still be nice.
That said, at present it's not something I personally have any use case for. There was an article floating around the web recently where the deficiencies of unicode implementations was discussed and I recall something there or in related discussions about use cases for having different views into a string. Wow that was vague. :) The different views into unicode strings certainly comes up from time to time on our lists.
This does not fit the picture as long as strings are indexable and sliceable, in my opinion. But most importantly, from the user practice & experience perspective, and while from a theoretical one it may be debattable, I consider it a great feature of python that everyday "mondane" string processing can be done using simple and easy Python string routines (i include here indexing & slicing). Alternatives would be regexes (read: Perl) and/or matching/parsing/searching libs (eg pyparsing) everywhere in python code; both are difficult, error-prone, hard to debug. The former are plain esoteric (but terribly practicle ;-), and I'm happy to rarely have to decipher *others'* regexes when reading python code (my own are far easier, indeed ;-). Denis
data:image/s3,"s3://crabby-images/7512f/7512f7eaafca3d5f6c2d0754d50e40b6b0da65ba" alt=""
On 01/02/2014 06:59 PM, Guido van Rossum wrote:
This is driven by a real-world example wherein a large number of prefixes stored in a set, necessitating: any('spam'.startswith(c) for c in prefixes) # or 'spam'.startswith(tuple(prefixes)) Neither of these strikes me as bad. Also, depending on whether the set of prefixes itself changes dynamically, it may be best to lift the tuple() call out of the startswith() call.
I agree. The any() formulation proves good enough in practice. Creating a tuple can be a bit tricky, since the list of prefixes could be large and could change.
However, .startswith doesn't seem to be the only example of this, and the other examples are free of the string/iterable ambiguity: isinstance(x, {int, float}) And there could still be another ambiguity here: a metaclass could conceivably make its instances (i.e. classes) iterable.
It's an interesting point that there's fundamental ambiguity between providing an iterable of arguments or providing a single argument that is itself an iterable (e.g., in the case of a type that is itself iterable, like Enum) In fact, I've actually warmed up to the any() formulation, because it makes explicit which behaviour you want.
I do agree that it's definitely important to retain the behaviour of: 'spam'.startswith('sz') Duh. :-)
You never know...
All in all I hope you will give up your push for this feature. It just doesn't seem all that important, and you really just move the inconsistency to a different place (special-casing strings instead of tuples).
For these functions and methods, being able to provide a tuple of arguments instead of a single argument seems mostly a convenience. It allows the most common case of wanting to internalise the iteration with a minimum of ambiguity. The any() or tuple() formulation are available where needed. In the end, I'm happy to drop the push for this feature. (In general, I agree that there isn't a need to stamp out all inconsistencies or to belabour the use of abstract types.) Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com
participants (13)
-
Alexander Heger
-
Amber Yust
-
Andrew Barnert
-
Bruce Leban
-
Chris Angelico
-
David Townshend
-
Devin Jeanpierre
-
Eric Snow
-
Guido van Rossum
-
James Powell
-
spir
-
Steven D'Aprano
-
Terry Reedy