[Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)
MRAB
python at mrabarnett.plus.com
Wed Dec 7 19:52:25 EST 2016
On 2016-12-07 23:52, Mikhail V wrote:
> In past discussion about inputing and printing characters,
> I was proposing decimal notation instead of hex.
> Since the discussion was lost in off-topic talks, I'll try to
> summarise my idea better.
>
> I use ASCII only for code input (there are good reasons for that).
> Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode
> directly and it works now in system console.
>
> Suppose I only start programming and want to do some character manipulation.
> The vey first thing I would probably start with is a simple output for
> latin and cyrillic capital letters:
>
> caps_lat = ""
> for o in range(65, 91):
> caps_lat = caps_lat + chr(o)
> print (caps_lat)
>
> caps_cyr = ""
> for o in range(1040, 1072):
> caps_cyr = caps_cyr + chr(o)
> print (caps_cyr)
>
>
> Which prints:
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
>
>
> Say, I want now to input something direct in code:
>
> s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042)
>
> Which works fine and has clean look. However it is not very convinient
> because of much typing and also, if I generate such strings,
> adds a bit more complexity. But in general it is fine, and I use this
> method currently.
>
> =========
> Proposal: I would want to have a possibility to input it *by decimals*:
>
> s = "first cyrillic letters: \{1040}\{1041}\{1042}"
> or:
> s = "first cyrillic letters: \(1040)\(1041)\(1042)"
>
> =========
>
It's usually the case that escapes are \ followed by an ASCII-range
letter or digit; \ followed by anything else makes it a literal, even if
it's a metacharacter, e.g. " terminates a string that starts with ", but
\" is a literal ", so I don't like \{...}.
Perl doesn't have \u... or \U..., it has \x{...} instead, and Python
already has \N{...}, so:
s = "first cyrillic letters: \d{1040}\d{1041}\d{1042}"
might be better, but I'm still -1 because hex is usual when referring to
Unicode codepoints.
More information about the Python-ideas
mailing list