[Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

Nick Timkovich prometheus235 at gmail.com
Wed Dec 7 19:13:31 EST 2016


Out of curiosity, why do you prefer decimal values to refer to Unicode code
points? Most references, http://unicode.org/charts/PDF/U0400.pdf (official)
or https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF ,
prefer to refer to them by hexadecimal as the planes and ranges are broken
up by hex values.

On Wed, Dec 7, 2016 at 5:52 PM, Mikhail V <mikhailwas at gmail.com> wrote:

> In past discussion about inputing and printing characters,
> I was proposing decimal notation instead of hex.
> Since the discussion was lost in off-topic talks, I'll try to
> summarise my idea better.
>
> I use ASCII only for code input (there are good reasons for that).
> Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode
> directly and it works now in system console.
>
> Suppose I only start programming and want to do some character
> manipulation.
> The vey first thing I would probably start with is a simple output for
> latin and cyrillic capital letters:
>
> caps_lat = ""
> for o in range(65, 91):
>     caps_lat =  caps_lat + chr(o)
> print (caps_lat)
>
> caps_cyr = ""
> for o in range(1040, 1072):
>     caps_cyr =  caps_cyr + chr(o)
> print (caps_cyr)
>
>
> Which prints:
> ABCDEFGHIJKLMNOPQRSTUVWXYZ
> АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
>
>
> Say, I want now to input something direct in code:
>
> s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042)
>
> Which works fine and has clean look. However it is not very convinient
> because of much typing and also, if I generate such strings,
> adds a bit more complexity. But in general it is fine, and I use this
> method currently.
>
> =========
> Proposal: I would want to have a possibility to input it *by decimals*:
>
> s = "first cyrillic letters: \{1040}\{1041}\{1042}"
> or:
> s = "first cyrillic letters: \(1040)\(1041)\(1042)"
>
> =========
>
> This is more compact and seems not very contradictive with
> current Python escape characters in string literals.
> So backslash is a start of some escaping in most cases.
>
> For me most important is that in such way I would avoid
> any presence of hex numbers in strings, which I find very good
> for readability and for me it is very convinient since I use decimals
> for processing everywhere (and encourage everyone to do so).
>
> So this is my proposal, any comments on this are appreciated.
>
>
> PS:
>
> Currently Python 3 supports these in addition to \x:
> (from https://docs.python.org/3/howto/unicode.html)
> """
> If you can’t enter a particular character in your editor or want to keep
> the source code ASCII-only for some reason, you can also use escape
> sequences in string literals.
>
> >>> "\N{GREEK CAPITAL LETTER DELTA}"  # Using the character name
> >>> "\u0394"                          # Using a 16-bit hex value
> >>> "\U00000394"                      # Using a 32-bit hex value
>
> """
> So I have many possibilities and all of them strangely contradicts with
> my image of intuitive and readable. Well, using charater name is readable,
> but seriously not much of a practical solution for input, but could be
> very useful
> for printing description of a character.
>
>
> Mikhail
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161207/1fedba87/attachment-0001.html>


More information about the Python-ideas mailing list