I've just spotted inconsistency between string and lists handling:
'' in 'abc' True '' in 'abc'.split() False [] in ['a', 'b', 'c'] False
Why strings here behave differently than other sequence types? Is that by design?
Strings have a common use case which lists do not: finding subsequences/substrings. Consider the following:
string = 'abcdefghijklmnop' 'def' in string True list('def') in list(string) False
The contains operator ("in") has a different meaning than the contains operator for a list. A list contains an object if (and only if) that object is a single element of the list. A string contains another string if (and only if) the other string is a substring of the first string. Matthew Lefavor NASA GSFC [Microtel, LLC] Mail Code 699.0/Org Code 582.0 matthew.lefavor@nasa.gov (301) 614-6818 (Desk) (443) 758-4891 (Cell) On 7/18/12 1:30 PM, "anatoly techtonik" <techtonik@gmail.com> wrote:
I've just spotted inconsistency between string and lists handling:
'' in 'abc' True '' in 'abc'.split() False [] in ['a', 'b', 'c'] False
Why strings here behave differently than other sequence types? Is that by design? _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On 2012-07-18, at 19:30 , anatoly techtonik wrote:
I've just spotted inconsistency between string and lists handling:
'' in 'abc' True '' in 'abc'.split() False [] in ['a', 'b', 'c'] False
Why strings here behave differently than other sequence types? Is that by design?
Erm… yes? `in` would not be very useful for strings if you could only use it to check for a single character would it?
On 2012-07-18, at 19:43 , Masklinn wrote:
On 2012-07-18, at 19:30 , anatoly techtonik wrote:
I've just spotted inconsistency between string and lists handling:
'' in 'abc' True '' in 'abc'.split() False [] in ['a', 'b', 'c'] False
Why strings here behave differently than other sequence types? Is that by design?
Erm… yes? `in` would not be very useful for strings if you could only use it to check for a single character would it?
in fact, things used to work that way in older Python, this was specifically changed to the current behavior *as noted in the documentation*:
When s is a string or Unicode string object the in and not in operations act like a substring test. In Python versions before 2.3, x had to be a string of length 1. In Python 2.3 and beyond, x may be a string of any length.
A Python string, you may want to note, is a string. Not a sequence of characters. The first item of a 1-character string is itself, all basic (step-less) slices of a string are contained in itself (including itself and the empty string), you can infinitely get the first item of a non-empty string, and I'm sure I'm missing plenty.
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters.
It's both (with the caveat that, in Python, a character is just a string of length 1). (See: http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy ) -- Devin
On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters.
It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer). So Python strings don't have reified characters, a string's item and a slice of size 1 are essentially identical which is pretty much unique to them (as far as my knowledge of Python's sequences go). Which is not a bad thing, mind you, it makes working with strings much more pleasant.
On Wed, Jul 18, 2012 at 2:16 PM, Masklinn <masklinn@masklinn.net> wrote:
It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer).
No it isn't. Strings are adherents to the sequence protocol. The Python datatype reference echoes what I said, nearly exactly. http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy
So Python strings don't have reified characters, a string's item and a slice of size 1 are essentially identical which is pretty much unique to them (as far as my knowledge of Python's sequences go).
Nothing about that feature makes them not-sequences; instead, it makes them a rather special kind of sequence. -- Devin
On 2012-07-18, at 20:31 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 2:16 PM, Masklinn <masklinn@masklinn.net> wrote:
It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer).
No it isn't. Strings are adherents to the sequence protocol. The Python datatype reference echoes what I said, nearly exactly.
This has no relevance to my messages, I have not claimed anywhere that strings weren't sequences.
So Python strings don't have reified characters, a string's item and a slice of size 1 are essentially identical which is pretty much unique to them (as far as my knowledge of Python's sequences go).
Nothing about that feature makes them not-sequences; instead, it makes them a rather special kind of sequence.
I'm not sure why you're saying that. Again, I have never once claimed they were not sequences (quite the opposite in fact). Why the strawmanning?
Masklinn wrote:
On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters. It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer).
Python 3 does not have a 'character' type; it has 'str' which is made up of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly). ~Ethan~
On 07/18/2012 09:32 PM, Ethan Furman wrote:
Masklinn wrote:
On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters. It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer).
Python 3 does not have a 'character' type; it has 'str' which is made up of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
That's what he said. Could we stop the annoying "but I know it better than you without reading your message" please? Georg
Georg Brandl wrote:
On 07/18/2012 09:32 PM, Ethan Furman wrote:
Masklinn wrote:
On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters.
It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer).
Python 3 does not have a 'character' type; it has 'str' which is made up of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
That's what he said. Could we stop the annoying "but I know it better than you without reading your message" please?
I am having trouble equating what I said with with Masklinn said. Perhaps you could explain how they say the same thing instead of assuming I didn't read his message? ~Ethan~
On Fri, 20 Jul 2012 14:18:20 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Georg Brandl wrote:
On 07/18/2012 09:32 PM, Ethan Furman wrote:
Masklinn wrote:
On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters.
It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer).
Python 3 does not have a 'character' type; it has 'str' which is made up of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
That's what he said. Could we stop the annoying "but I know it better than you without reading your message" please?
I am having trouble equating what I said with with Masklinn said. Perhaps you could explain how they say the same thing instead of assuming I didn't read his message?
"Python 3 binaries" probably means "Python 3 bytes objects" above. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net
Masklinn's explanation is comprehensive clean to me. On Fri, Jul 20, 2012 at 11:56 PM, Georg Brandl <g.brandl@gmx.net> wrote:
On 07/18/2012 09:32 PM, Ethan Furman wrote:
Masklinn wrote:
On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters. It's both (with the caveat that, in Python, a character is just a string of length 1).
That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer).
Python 3 does not have a 'character' type; it has 'str' which is made up of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly).
That's what he said. Could we stop the annoying "but I know it better than you without reading your message" please?
Georg
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Thanks, Andrew Svetlov
Andrew Svetlov wrote:
Masklinn's explanation is comprehensive clean to me.
I'm glad that it's clear to someone, because to me the straight-forward, literal meaning of Masklinn's explanation (that Python 3 has a character type, and they're integers) is wrong. Python has no built-in "Char" type, under any spelling, let alone one which is also a subset of int. The non-literal meaning is hard to understand. I *guess* that Masklinn is trying to get across that Python 3 strings are Unicode strings, and characters in Unicode are actually code points, which are implemented at the C level as integers. If not that, I have no idea. I've more or less forgotten why this was important, but I am enjoying watching people try to out-pedant each other :)
On Fri, Jul 20, 2012 at 11:56 PM, Georg Brandl <g.brandl@gmx.net> wrote:
Masklinn wrote:
On 2012-07-18, at 20:06 , Devin Jeanpierre wrote:
On Wed, Jul 18, 2012 at 1:58 PM, Masklinn <masklinn@masklinn.net> wrote:
A Python string, you may want to note, is a string. Not a sequence of characters. It's both (with the caveat that, in Python, a character is just a string of length 1). That's playing with words, especially comparing strings with Python 3 binaries which *do* actually have a separate "character" type (reified to an integer). Python 3 does not have a 'character' type; it has 'str' which is made up of more 'str's, and it has 'byte' which is made up of 'int's (annoyingly). That's what he said. Could we stop the annoying "but I know it better
On 07/18/2012 09:32 PM, Ethan Furman wrote: than you without reading your message" please?
-- Steven
On Sat, Jul 21, 2012 at 7:19 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Andrew Svetlov wrote:
Masklinn's explanation is comprehensive clean to me.
I'm glad that it's clear to someone, because to me the straight-forward, literal meaning of Masklinn's explanation (that Python 3 has a character type, and they're integers) is wrong. Python has no built-in "Char" type, under any spelling, let alone one which is also a subset of int. The non-literal meaning is hard to understand. I *guess* that Masklinn is trying to get across that Python 3 strings are Unicode strings, and characters in Unicode are actually code points, which are implemented at the C level as integers. If not that, I have no idea.
You're pretty far off. He was talking about bytes objects, not str objects. -- Devin
participants (10)
-
anatoly techtonik
-
Andrew Svetlov
-
Antoine Pitrou
-
Devin Jeanpierre
-
Ethan Furman
-
Georg Brandl
-
Lefavor, Matthew (GSFC-582.0)[MICROTEL LLC]
-
Masklinn
-
Serhiy Storchaka
-
Steven D'Aprano