From: Mikhail V Sent: Wednesday, October 12, 2016 9:57 PM Subject: Re: [Python-ideas] Proposal for default character representation
Hello, and welcome to Python-ideas, where only a small portion of ideas go further, and where most newcomers that wish to improve the language get hit by the reality bat! I hope you enjoy your stay :)
On 13 October 2016 at 01:50, Chris Angelico email@example.com wrote:
On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V firstname.lastname@example.org
Way WAY less readable, and I'm comfortable working in both hex and
Please don't mix the readability and personal habit, which previuos repliers seems to do as well. Those two things has nothing to do with each other. If you are comfortable with old roman numbering system this does not make it readable. And I am NOT comfortable with hex, as well as most people would be glad to use single notation. But some of them think that they are cool because they know several numbering notations ;) But I bet few can actually understand which is more readable.
I'll turn your argument around: Not being comfortable with hex does not make it unreadable; it's a matter of habit (as Brendan pointed out in his separate reply).
You're the one who's non-standard here. Most of the world uses hex for Unicode codepoints.
No I am not the one, many people find it silly to use different notations for same thing - index of the element, and they are very right about that. I am not silly, I refuse to use it and luckily I can. Also I know that
is more readable than hex so my choice is supportend by the understanding and not simply refusing.
Unicode code points are represented using hex notation virtually everywhere I ever saw it. Your Unicode-code-points-as-decimal website was a new discovery for me (and, I presume, many others on this list). Since it's widely used in the world, going against that effectively makes you non-standard. That doesn't mean it's necessarily a bad thing, but it does mean that your chances (or anyone's chances) of actually changing that are equal to zero (and this isn't some gross exaggeration),
PS: that is rather peculiar, three negative replies already but with no
arguments why it would be bad to stick to decimal only, only some "others do it so" and "tradition" arguments.
"Others do it so" is actually a very strong argument. If all the rest of the world uses + to mean addition, and Python used + to mean subtraction, it doesn't matter how logical that is, it is *wrong*.
This actually supports my proposal perfectly, if everyone uses decimal why suddenly use hex for same thing - index of array. I don't see how your analogy contradicts with my proposal, it's rather supporting it.
I fail to see your point here. Where is that "everyone uses decimal"? Unless you stopped talking about representation in strings (which seems likely, as you're talking about indexing?), everything is represented as hex.
But I do want that you could abstract yourself from your habit for a while and talk about what would be better for the future usage.
I'll be that guy and tell you that you need to step back from your own idea for a while and consider your proposal and the current state of things. I'll also take the opportunity to reiterate that there is virtually no chance to change this behaviour. This doesn't, however, prevent you or anyone from talking about the topic, either for fun, or for finding other (related or otherwise) areas of interest that you think might be worth investigating further. A lot of threads actually branch off in different topics that came up when discussing, and that are interesting enough to pursue on their own.
everyone has to do the conversion from that to 201C.
Nobody need to do ANY conversions if use decimal, and as said everything is decimal: numbers, array indexes, ord() function returns decimal, you can imagine more examples so it is not only more readable but also more traditional.
You're mixing up more than just one concept here: - Integer literals; I assume this is what you meant, and you seem to forget (or maybe you didn't know, in which case here's to learning something new!) that 0xff is perfectly valid syntax, and store the integer with the value of 255 in base 10.
- Indexing, and that's completely irrelevant to the topic at hand (also see above bullet point).
- ord() which returns an integer (which can be interpreted in any base!), and that's both an argument for and against this proposal; the "against" side is actually that decimal notation has no defined boundary for when to stop (and before you argue that it does, I'll point out that the separations, e.g. grouping by the thousands, are culture-driven and not an international standard). There's actually a precedent for this in Python 2 with the \x escape (need I remind anyone why Python 3 was created again? :), but that's exactly a stone in the "don't do that" camp, instead of the other way around.
How many decimal digits would you use to denote a single character?
for text, three decimal digits would be enough for me personally, and in long perspective when the world's alphabetical garbage will dissapear, two digits would be ok.
You seem to have misunderstood the question - in "\u00123456", there is no ambiguity that this is a string consisting of 5 characters; the first one is '\u0012', the second one is '3', the third one is '4', the fourth one is '5', and the last one is '6'. In the string (using \d as a hypothetical escape method; regex gurus can go read #27364 ;) "\d00123456", how many characters does this contain? It's decimal, so should the escape grab the first 5 digits? Or 6 maybe? You tell me.
you have to pad everything to seven digits (\u0000034 for an ASCII quote)?
Depends on case, for input - some separator, or padding is also ok, I don't have problems with both. For printing obviously don't show leading zeros, but rather spaces.
No leading zeros? That means you don't have a fixed number of digits, and your string is suddenly very ambiguous (also see my point above).
But as said I find this Unicode only some temporary happening, it will go to history in some future and be used only to study extinct glyphs.
Unicode, a temporary happening? Well, strictly speaking, nobody can know that, but I'd expect that it's going to, someday, be *the* common standard. I'm not bathed in illusion, though.
All in all, that's a pretty interesting idea. However, it has no chance of happening, because a lot of code would break, Python would deviate from the rest of the world, this wouldn't be backwards compatible (and another backwards-incompatible major release isn't happening; the community still hasn't fully caught up with the one 8 years ago), and it would be unintuitive to anyone who's done computer programming before (or after, or during, or anytime).
I do see some bits worth pursuing in your idea, though, and I encourage you to keep going! As I said earlier, Python-ideas is a place where a lot of ideas are born and die, and that shouldn't stop you from trying to contribute. Python is 25 years old, and a bunch of stuff is there just for backwards compatibility; these kind of things can't get changed easily. The older (older by contribution period, not actual age) contributors still active don't try to fix what's not broken (to them). Newcomers, such as you, are a breath of fresh air to the language, and what helps make it thrive even more! By bringing new, uncommon ideas, you're challenging the status quo and potentially changing it for the best. But keep in mind that, with no clear consensus, the status quo always wins a stalemate.
I hope that makes sense!