There's a reason I've never actually proposed adding a char ....

On Wed, Oct 23, 2019 at 5:34 PM Andrew Barnert <abarnert@yahoo.com> wrote:

> Well, just adding a char type (and presumably a way of defining char literals) wouldn’t be too disruptive.

sure.

> But changing str to iterate chars instead of strs, that probably would be.

And that would be the whole point -- a char type by itself isn't very useful. in some ssense, the only difference between a char and a str would be that a char isn't iterable -- but the benefit would be that a string is an iterable (and sequence) of chars, rather than an (infinitely recursable) iterable of strings.

> Also, you’d have to go through a lot of functions and decide what types they should take.

sure would -- a lot of thought to see how disruptive it would be ...

> For example, does str.join still accept a string instead of an iterable of strings? Does it accept other iterables of char too?

if it accepted an iterable of either char or str, then I *think* there would be little disruption.

> Can you pass a char to str.__contains__

yes, that's a no brainer, the whole point is that a string would be a sequence of chars.

> or str.endswith?

I would think so -- a char would behave like a length-one string as much as possible.

> What about a tuple of chars?

that's an odd one -- but I'm not sutre I see the point, if you have a tuple of chars, you could "".join() them if you want a string, in any context.

> Or should we take the backward-compat breaking opportunity to eliminate the “str or tuple of str” thing and instead use *args, or at least change it to “str or iterable of str (which no longer includes str itself)”?

Is this for .endswith() and friends? if so, there was discussion a while back about that -- but probably not the time to introduce even more backward incompatible changes.

And I'm not sure how much string functionality a char should have -- probably next to none, as the point is that it would be easy to distinguish from a string that happened to have one character.

> Surely you’d want to be able to do things like isdigit or swapcase. Even C has functions to do most of that kind of stuff on chars.

probably -- it would be least disruptive for a char to act as much as possible the same as a length-one string -- so maybe inexorability and indexability would be it.

> But I think that, other than join and maybe encode and translate,

not sure why encode or translate should be an issue off the top of my head -- it would surley be a unicode char :-)

> there’s an obvious right answer for every str method and operator, so this isn’t too much of a problem.

well, we'd have to go through all of them, and do a lot of thinking...

I think the greater confusion is where can you use a char instead of a string in other places? using it as a filename, for instance would make it pointless for at least the cases I commonly deal with (list of filenames).

I can only imagine how many "things" take a string where a char would make sense, but then it gets harder to distinguish them all.

> Speaking of operators, should char+int and char-int and char-char be legal? (What about char%int? A thousand students doing the rot13 assignment would rejoice, but allowing % without * and // is kind of weird, and allowing * and // even weirder—as well as potentially confusing with str*int being legal but meaning something very different.)

I would say no -- in C a char IS an unsigned 8bit int, but that's C -- in pyhton a char and a number are very diferent things.

ord() and chr() would work, of course.

By the way, the bytes and bytearray types already does this -- index into or loop through a bytes object, you get an int.

Sure, but b'abc'.find(66) is -1, and b'abc'.replace(66, 70) is a TypeError, and so on.

I wonder if they need to be -- would we need a "byte" type, or would it be OK to accept an int in all those sorts of places?

> Fixing those inconsistencies is what I meant by “go all the way to making them sequences of ints”. But it might be friendlier to undo the changes and instead add a byte type like the char type for bytes to be a sequence of. I’m not sure which is better.

me neither.

> But anyway, I think all of these questions are questions for a new language. If making str not iterate str was too big a change even for 3.0, how could it be reasonable for any future version?

Well, I don't know that it was seriously considered -- with the Unicode changes, that WOULD have been the time to do it!

Again though,, it seems like it would be pretty disruptive, so a non-starter, but maybe not?

-CHB

Christopher Barker, PhD

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython