[Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

Wed Dec 7 19:52:25 EST 2016

On 2016-12-07 23:52, Mikhail V wrote:
> In past discussion about inputing and printing characters,
> I was proposing decimal notation instead of hex.
> Since the discussion was lost in off-topic talks, I'll try to
> summarise my idea better.
>
> I use ASCII only for code input (there are good reasons for that).
> Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode
> directly and it works now in system console.
>
> Suppose I only start programming and want to do some character manipulation.
> The vey first thing I would probably start with is a simple output for
> latin and cyrillic capital letters:
>
> caps_lat = ""
> for o in range(65, 91):
>     caps_lat =  caps_lat + chr(o)
> print (caps_lat)
>
> caps_cyr = ""
> for o in range(1040, 1072):
>     caps_cyr =  caps_cyr + chr(o)
> print (caps_cyr)
>
>
> Which prints:
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
>
>
> Say, I want now to input something direct in code:
>
> s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042)
>
> Which works fine and has clean look. However it is not very convinient
> because of much typing and also, if I generate such strings,
> adds a bit more complexity. But in general it is fine, and I use this
> method currently.
>
> =========
> Proposal: I would want to have a possibility to input it *by decimals*:
>
> s = "first cyrillic letters: \{1040}\{1041}\{1042}"
> or:
> s = "first cyrillic letters: \(1040)\(1041)\(1042)"
>
 > =========
 >
It's usually the case that escapes are \ followed by an ASCII-range 
letter or digit; \ followed by anything else makes it a literal, even if 
it's a metacharacter, e.g. " terminates a string that starts with ", but 
\" is a literal ", so I don't like \{...}.

Perl doesn't have \u... or \U..., it has \x{...} instead, and Python 
already has \N{...}, so:

s = "first cyrillic letters: \d{1040}\d{1041}\d{1042}"

might be better, but I'm still -1 because hex is usual when referring to 
Unicode codepoints.